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PREFACE 


My purpose in this book is to treat linear transformations on finite- 
dimensional vector spaces by the methods of more general theories. The 
idea is to emphasize the simple geometric notions common to many parts 
of mathematics and its applications, and to do so in a language that gives 
away the trade secrets and tells the student what is in the back of the minds 
of people proving theorems about integral equations and Hilbert spaces. 
The reader does not, however, have to share my prejudiced motivation. 
Except for an occasional reference to undergraduate mathematics the book 
is self-contained and may be read by anyone who is trying to get a feeling 
for the linear problems usually discussed in courses on matrix theory or 
“higher” algebra. The algebraic, coordinate-free methods do not lose power 
and elegance by specialization to a finite number of dimensions, and they 
are, in my belief, as elementary as the classical coordinatized treatment. 

I originally intended this book to contain a theorem if and only if an 
infinite-dimensional generalization of it already exists. The tempting 
easiness of some essentially finite-dimensional notions and results was, 
however, irresistible, and in the final result my initial intentions are just 
barely visible. They are most clearly seen in the emphasis, throughout, on 
generalizable methods instead of sharpest possible results. The reader may 
sometimes see some obvious way of shortening the proofs I give. In such 
cases the chances are that the infinite-dimensional analogue of the shorter 
proof is either much longer or else non-existent. 

A preliminary edition of the book (Annals of Mathematics Studies, 
Number 7, first published by the Princeton University Press in 1942) has 
been circulating for several years. In addition to some minor changes in 
style and in order, the difference between the preceding version and this 
one is that the latter contains the following new material: (1) A brief dis- 
cussion of fields, and, in the treatment of vector spaces with inner products, 
special attention to the real case. (2) A definition of determinants in 
invariant terms, via the theory of multilinear forms. (3) Exercises. 

The exercises (well over three hundred of them) constitute the most 
significant addition; I hope that they will be found useful by both student 
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and teacher. There are two things about them the reader should know. 
First, if an exercise is neither imperative (“prove that . . .”) nor interroga- 
tive (“is it true that . . . ?”) but merely declarative, then it is intended 
as a challenge. For such exercises the reader is asked to discover if the 
assertion is true or false, prove it if true and construct a counterexample if 
false, and, most important of all, discuss such alterations of hypothesis and 
conclusion as will make the true ones false and the false ones true. Second, 
the exercises, whatever their grammatical form, are not always placed so 
as to make their very position a hint to their solution. Frequently exer- 
cises are stated as soon as the statement makes sense, quite a bit before 
machinery for a quick solution has been developed. A reader who tries 
(even unsuccessfully) to solve such a “misplaced” exercise is likely to ap- 
preciate and to understand the subsequent developments much better for 
his attempt. Having in mind possible future editions of the book, I ask 
the reader to let me know about errors in the exercises, and to suggest im- 
provements and additions. (Needless to say, the same goes for the text.) 

None of the theorems and only very few of the exercises are my discovery ; 
most of them are known to most working mathematicians, and have been 
known for a long time. Although I do not give a detailed list of my sources, 
I am nevertheless deeply aware of my indebtedness to the books and papers 
from which I learned and to the friends and strangers who, before and 
after the publication of the first version, gave me much valuable encourage- 
ment and criticism. I am particularly grateful to three men: J. L. Doob 
and Arlen Brown, who read the entire manuscript of the first and the 
second version, respectively, and made many useful suggestions, and 
John von Neumann, who was one of the originators of the modem spirit 
and methods that I have tried to present and whose teaching was the 
inspiration for this book. 


P. R. H. 
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SPACES 


§ 1. Fields 

In what follows we shall have occasion to use various classes of numbers 
(such as the class of all real numbers or the class of all complex numbers). 
Because we should not, at this early stage, commit ourselves to any specific 
class, we shall adopt the dodge of referring to numbers as scalars . The 
reader will not lose anything essential if he consistently interprets scalars 
as real numbers or as complex numbers; in the examples that we shall 
study both classes will occur. To be specific (and also in order to operate 
at the proper level of generality) we proceed to list all the general facts 
about scalars that we shall need to assume. 

(A) To every pair, a and 0, of scalars there corresponds a scalar a + 0, 
called the sum of a and 0, in such a way that 

(1) addition is commutative, a + 0 = 0 + a, 

(2) addition is associative, <* + (0 + y) = (a + 0) + y, 

(3) there exists a unique scalar 0 (called zero ) such that a + 0 = a for 
every scalar a, and 

(4) to every scalar a there corresponds a unique scalar —a such that 
a + (-*) = o. 

(B) To every pair, a and 13, of scalars there corresponds a scalar a0, 
called the 'product of a and 0, in such a way that 

(1) multiplication is commutative, a0 = 0a, 

(2) multiplication is associative, a(0y) = (a0)y, 

(3) there exists a unique non-zero scalar 1 (called one) such that al = a 
for every scalar a, and 

(4) to every non-zero scalar a there corresponds a unique scalar a“ x 
^or ^ such that acr ” 1 — l. 
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(C) Multiplication is distributive with respect to addition, a(p + 7) 
■ afi + ay. 

If ad dition and multiplication are defined within some set of objects 
(scalars) so that the conditions (A), (B), and (C) are satisfied, then that 
set (together with the given operations) is called a field. Thus, for example, 
the set Q of all rational numbers (with the ordinary definitions of sum 
and product) is a field, and the same is true of the set (R of all real numbers 
and the set C of all complex numbers. 


EXERCI SES 

1. Almost all the laws of elementary arithmetic are consequences of the axioms 
defining a field. Prove, in particular, that if SF is a field, and if a, 0, and y belong 
to $F , then the following relations hold. 

(a) 0 + a — a. 

(b) If a + 0 = a + 7> then/3 = y. 

(c) a + 08 - a) = /3. (Here 0 - a = 0 + (“«)•) 

(d) a-0 = 0-a = 0. (For clarity or emphasis we sometimes use the dot to indi- 
cate multiplication.) 

(e) (-1 )ol = -a. 

(f) (-«)(-£) - off. 

(g) If a0 = 0, then either a - 0 or 0 = 0 (or both). 

2. (a) Is the set of all positive integers a field? (In familiar systems, such as the 
integers, we shall alm ost always use the ordinary operations of addition and multi- 
plication. On the rare occasions when we depart from this convention, we shall 
give ample warning. As for “positive/’ by that word we mean, here and elsewhere 
in this book, “greater than or equal to zero.” If 0 is to be excluded, we shall say 
“strictly positive.”) 

(b) What about the set of all integers? 

(c) Can the answers to these questions be changed by re-defining addition or 
multiplication (or both)? 

3. Let m be an integer, m ^ 2, and let Zm be the set of all positive integers less 
than m, Z» - {0, 1, • • •, m - I}. If a and 0 are in Zm, let a + 0 be the least 
positive remainder obtained by dividing the (ordinary) sum of a and 0 by m, and, 
similarly, let a0 be the least positive remainder obtained by dividing the (ordinary) 
product of a and 0 by m. (Example: if m = 12, then 3 + 11 = 2 and 311 = 9.) 

(a) Prove that Zm is a field if and only if m is a prime. 

(b) What is —1 in Z 6 ? 

(c) What is £ in Z 7 ? 

4. The example of Zp (where p is a prime) shows that not quite all the laws of 
elementary arithmetic hold in fields; in Z 2 , for instance, 1 + 1 = 0. Prove that 
if IF is a field, then either the result of repeatedly adding 1 to itself is always dif- 
ferent from 0, or else the first time that it is equal to 0 occurs when the nmnber 
of summands is a prime. (The characteristic of the field 5 is defined to be 0 in the 
first case and the crucial prime in the second.) 
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5. Let Q(V2 ) be the set of all real numbers of the form a + 0 \/2, where 
a and 0 are rational. 

(a) Is Q(V2 ) a field? 

(b) What if a and 0 are required to be integers? 

6. (a) Does the set of all polynomials with integer coefficients form a field? 

(b) What if the coefficients are allowed to be real numbers? 

7. Let $F be the set of all (ordered) pairs (a, 0) of real numbers. 

(a) If addition and multiplication are defined by 

(«> 0) + (y, 8) = (a + y, 0 + S) 

and 

(a,0)(y,8) = (ay,08), 

does become a field? 

(b) If addition and multiplication are defined by 

(«,0) + (y,5) = (a + y,0 + 5) 
and 

(«. P)(y, 8) = (ay - 08, a8 + for ), 

is $F a field then? 

(c) What happens (in both the preceding cases) if we consider ordered pairs of 
complex numbers instead? 


§ 2. Vector spaces 

We come now to the basic concept of this book. For the definition 
that follows we assume that we are given a particular field SF; the scalars 
to be used are to be elements of SF. 

Definition. A vector space is a set V of elements called vectors satisfying 

the following axioms. 

(A) To every pair, x and y, of vectors in V there corresponds a vector 
x + y, called the sum of x and y, in such a way that 

(1) addition is commutative, x + y = y + x, 

(2) addition is associative, x + {y + z) = (x + y) + z, 

(3) there exists in V a unique vector 0 (called the origin) such that 
x + 0 = x for every vector x, and 

(4) to every vector x in V there corresponds a unique vector — x such 
that x + (— x) = 0. 

(B) To every pair, a and x, where a is a scalar and x is a vector in *0, 
there corresponds a vector ax in *0, called the product of a and x, in such 
a way that 

(1) multiplication by scalars is associative, a(0x) = (a0)x, and 

(2) lx — x for every vector x. 
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(C) (1) Multiplication by scalars is distributive with respect to vector 
addition, a(x + y) = ax + ay, and 

(2) multiplication by vectors is distributive with respect to scalar ad- 
dition, (a + fi)x = ax + fix. 

These axioms are not claimed to be logically independent; they are 
merely a convenient characterization of the objects we wish to study. The 
relation between a vector space V and the underlying field SF is usually 
described by saying that *0 is a vector space over tf. If $ is the field (R 
of real numbers, V is called a real vector space; similarly if $ is Q or if 
is 6, we speak of rational vector spaces or complex vector spaces . 

§3. Examples 

Before discussing the implications of the axioms, we give some examples. 
We shall refer to these examples over and over again, and we shall use the 
notation established here throughout the rest of our work. 

(1) Let e x (= C) be the set of all complex numbers; if we interpret 
x + y and ax as ordinary complex numerical addition and multiplication, 
C 1 becomes a complex vector space. 

(2) Let (P be the set of all polynomials, with complex coefficients, in a 
variable t . To make (P into a complex vector space, we interpret vector 
addition and scalar multiplication as the ordinary addition of two poly- 
nomials and the multiplication of a polynomial by a complex number; 
the origin in (P is the polynomial identically zero. 

Example (1) is too simple and example (2) is too complicated to be 
typical of the main contents of this book. We give now another example 
of complex vector spaces which (as we shall see later) is general enough for 
all our purposes. 

(3) Let e n , n = I, 2, *••, be the set of all n-tuples of complex numbers. 
If x =($!,.• ., £ n ) and y - (* lf • • •, Vn) are elements of e ft , we write, by 
definition, 

x + y = (£i + vu • • # > in + Vn), 
ax - (a(i, oin), 

0- (0, ••*,()), 

—X — ( — (-1, • • •, — in)- 

It is easy to verify that all parts of our axioms (AX (BX and (CX § % are 
satisfied, so that C n is a complex vector space ; it will be called rirdirnensional 
complex coordinate space . 
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(4) For each positive integer n, let <P n be the set of all polynomials 
(with complex coefficients, as in example (2)) of degree — 1, together 
with the polynomial identically zero. (In the usual discussion of degree, 
the degree of this polynomial is not defined, so that we cannot say that it 
has degree — 1.) With the same interpretation of the linear operations 
(addition and scalar multiplication) as in (2), (P» is a complex vector space, 

(5) A close relative of 0 n is the set 6t n of all n-tuples of real numbers. 
With the same formal definitions of addition and scalar multiplication as 
for e n , except that now we consider only real scalars a, the space <R n is 
a real vector space; it will be called n-dimensional real coordinate space, 

(6) All the preceding examples can be generalized. Thus, for instance, 
an obvious generalization of (1) can be described by saying that every 
field may be regarded as a vector space over itself. A common generaliza- 
tion of (3) and (5) starts with an arbitrary field ff and forms the set ff* 
of n-tuples of elements of ff; the formal definitions of the linear operations 
are the same as for the case ff = 0. 

(7) A field, by definition, has at least two elements; a vector space, 
however, may have only one. Since every vector space contains an origin, 
there is essentially (i.e., except for notation) only one vector space having 
only one vector. This most trivial vector space will be denoted by 0. 

(8) If, in the set 61 of all real numbers, addition is defined as usual and 
multiplication of a real number by a rational number is defined as usual, 
then (R becomes a rational vector space. 

(9) If, in the set 0 of all complex numbers, addition is defined as usual 
and multiplication of a complex number by a real number is defined as 
usual, then 0 becomes a real vector space. (Compare this example with 
(1); they are quite different.) 

§4. Comments 

A few comments are in order on our axioms and notation. There are 
striking similarities (and equally striking differences) between the axioms 
for a field and the axioms for a vector space over a field. In both cases, 
the axioms (A) describe the additive structure of the system, the axioms 
<B) describe its multiplicative structure, and the axioms (C) describe the 
connection between the two structures. Those familiar with algebraic 
terminology will have recognized the axioms (A) (in both § 1 and § 2) as 
the defining conditions of an abelian (commutative) group; the axioms (B) 
and (C) (in § 2) express the fact that the group admits scalars as operators. 
We mention in passing that if the scalars are elements of a ring (instead 
of a field), the generalized concept corresponding to a vector space is 
called a module. 
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Special real vector spaces (such as <5t 2 and <R 3 ) are familiar in geometry. 
There seems at this stage to be no excuse for our apparently uninteresting 
insistence on fields other than <R, and, in particular, on the field <3 of complex 
numbers. We hope that the reader is willing to take it on faith that we 
shall have to make use of deep properties of complex numbers later (con- 
jugation, algebraic closure), and that in both the applications of vector 
spaces to modem (quantum mechanical) physics and the mathematical 
generalization of our results to Hilbert space, complex numbers play an 
important role. Their one great disadvantage is the difficulty of drawing 
pictures; the ordinary picture (Argand diagram) of Q l is indistinguishable 
from that of <R 2 , and a graphic representation of e 2 seems to be out of human 
reach. On the occasions when we have to use pictorial language we shall 
therefore use the terminology of (ft n in e n , and speak of e 2 , for example, 
as a plane. 

Finally we comment on notation. We observe that the symbol 0 has 
been used in two meanings: once as a scalar and once as a vector. To make 
the situation worse, we shall later, when we introduce linear functionals 
and linear transformations, give it still other meanings. Fortunately the 
relations among the various interpretations of 0 are such that, after this 
word of warning, no confusion should arise from this practice. 


EXERCISES 

1. Prove that if x and y are vectors and if a is a scalar, then the following rela- 
tions hold. 

(a) 0 + x — x. 

(b) -0 - 0. 

( c ) «* 0 * 0 . 

(d) 0*x = 0. (Observe that the same symbol is used on both sides of this equa- 
tion; on the left it denotes a scalar, on the right it denotes a vector.) 

(e) If ax = 0, then either ot = 0 or x = 0 (or both). 

(f) —x * (— l)x. 

(g) V + (x-y) = x. (Here x - y = x + (-y).) 

2. If p is a prime, then Zp* is a vector space over Zp (cf. § 1, Ex. 3); how many 
vectors are there in this vector space? 

3. Let V be the set of all (ordered) pairs of real numbers. If x « (£i, &) and 
y » rjt) are elements of *0, write 

x + y = (£i + rji> t* + 

ax = («&, 0 ) 

0 - ( 0 , 0 ) 

—x » (— £i, — 

1st) a vector space with respect to these definitions of the linear operations? Why? 
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4. Sometimes a subset of a vector space is itself a vector space (with respect to 
the linear operations already given). Consider, for example, the vector space & 
and the subsets *0 of 6* consisting of those vectors (£i, fc, £*) for which 

(a) £i is real, 

(b) fc = 0, 

(c) either — 0 or £* = 0, 

(d) & + & « 0, 

(e) |i + * 1. 

In which of these cases is V a vector space? 

5. Consider the vector space (P and the subsets *0 of <P consisting of those vectors 
(polynomials) x for which 

(a) x has degree 3, 

(b) 2z(0) - 4D, 

(c) x(t) ^ 0 whenever 0 ^ g 1, 

(d) x(t) = 41 — 0 for all L 

In which of these cases is V a vector space? 

§ 5. Linear dependence 

Now that we have described the spaces we shall work with, we must 
specify the relations among the elements of those spaces that will be of 
interest to us. 

We begin with a few words about the summation notation. If cor- 
responding to each of a set of indices i there is given a vector x*-, and if it 
is not necessary or not convenient to specify the set of indices exactly, 
we shall simply speak of a set {x<| of vectors. (We admit the possibility 
that the same vector corresponds to two distinct indices. In all honesty, 
therefore, it should be stated that what is important is not which vectors 
appear in {x»} , but how they appear.) If the index-set under consideration 
is finite, we shall denote the sum of the corresponding vectors by x * 
(or, when desirable, by a more explicit symbol such as x*). In order 
to avoid frequent and fussy case distinctions, it is a good idea to admit 
into the general theory sums such as T ^ x» even when there are no indices 
i to be summed over, or, more precisely, even when the index-set under 
consideration is empty. (In that case, of course, there are no vectors to 
sum, or, more precisely, the set {x,} is also empty.) The value of such 
an “empty sum” is defined, naturally enough, to be the vector 0. 

Definition. A finite set {x,} of vectors is linearly dependent if there 

exists a corresponding set {a,} of scalars, not all zero, such that 

<*&i =* 0. 

If, on the other hand, 0 implies that a,- = 0 for each i, the 

set [xi] is linearly independent . 
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The wording of this definition is intended to cover the case of the empty 
set; the result in that case, though possibly paradoxical, dovetails very 
satisfactorily with the rest of the theory. The result is that the empty 
set of vectors is linearly independent. Indeed, if there are no indices i, 
then it is not possible to pick out some of them and to assign to the selected 
ones a non-zero scalar so as to make a certain sum vanish. The trouble 
is not in avoiding the assignment of zero; it is in finding an index to which 
something can be assigned. Note that this argument shows that the 
empty set is not linearly dependent; for the reader not acquainted with 
arg uing by “vacuous implication,” the equivalence of the definition of 
linear independence with the straightforward negation of the definition 
of linear dependence needs a little additional intuitive justification. The 
easiest way to feel comfortable about the assertion “2< = 0 implies 

that a,- = 0 for each i,” in case there are no indices i, is to rephrase it this 
way: “if £*<£,• = 0, then there is no index i for which a,- ^ 0.” This 
version is obviously true if there is no index i at all. 

Linear dependence and independence are properties of sets of vectors; 
it is customary, however, to apply the adjectives to vectors themselves, 
and thus we shall sometimes say “a set of linearly independent vectors” 
instead of “a linearly independent set of vectors.” It will be convenient 
also to speak of the linear dependence and independence of a not necessarily 
finite set, SC, of vectors. We shall say that 9C is linearly independent if 
every finite subset of 9C is such; otherwise 9C is linearly dependent. 

To gain insight into the meaning of linear dependence, let us study the 
examples of vector spaces that we already have. 

(1) If x and y are any two vectors in C 1 , then x and y form a linearly 
dependent set. If x = y — 0, this is trivial; if not, then we have, for 
example, the relation yx + {—x)y — 0. Since it is clear that every set 
containing a linearly dependent subset is itself linearly dependent, this 
shows that in C 1 every set containing more than one element is a linearly 
dependent set. 

(2) More interesting is the situation in the space <P. The vectors x, y, 
and z, defined by 

x(t) = l — t, 
y(t) = f(l - t), 
z(<) - 1 - t 2 , 

are, for example, linearly dependent, since x + y — z = 0. However, the 
infinite set of vectors xo, x\, xg, • • • , defined by 

x 0 (t) = 1, xg(t)-*, •••, 
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is a linearly independent set, for if we had any relation of the form 

<X(yXo + aiX i H f- a n z n — 0, 

then we should have a polynomial identity 

ocq + ait + * * * + a nt n = 0 , 

whence c*o ~ a i — • • • ** « n = 0. 

(3) As we mentioned before, the spaces 6 n are the prototype of what 
we want to study; let us examine, for example, the case n = 3. To those 
familiar with higher-dimensional geometry, the notion of linear dependence 
in this space (or, more properly speaking, in its real analogue (R 3 ) has a 
concrete geometric meaning, which we shall only mention. In geometrical 
language, two vectors are linearly dependent if and only if they are col- 
linear with the origin, and three vectors are linearly dependent if and 
only if they are coplanar with the origin. (If one thinks of a vector not 
as a point in a space but as an arrow pointing from the origin to some given 
point, the preceding sentence should be modified by crossing out the phrase 
“with the origin” both times that it occurs.) We shall presently introduce 
the notion of linear manifolds (or vector subspaces) in a vector space, and, 
in that connection, we shall occasionally use the language suggested by 
such geometrical considerations. 

§ 6. Linear combinations 

We shall say, whenever x = 2* c^x,-, that x is a linear combination of 
M ; we shall use without any further explanation all the simple gram- 
matical implications of this terminology. Thus we shall say, in case x 
is a linear combination of {x,}, that x is linearly dependent on {x*} ; we 
shall leave to the reader the proof that if {x»J is linearly independent, 
then a necessary and sufficient condition that x be a linear combination 
of {xi} is that the enlarged set, obtained by adjoining x to { x, } , be linearly 
dependent. Note that, in accordance with the definition of an empty 
sum, the origin is a linear combination of the empty set of vectors; it is, 
moreover, the only vector with this property. 

The following theorem is the fundamental result concerning linear 
dependence. 

Theorem. The set of non-zero vectors Xi, • * •, x n is linearly dependent 
if and only if some x*, 2 g k ^ n, is a linear combination of the preceding 
ones. 


proof. Let us suppose that the vectors x Xy * • *,x„are linearly dependent, 
and let k be the first integer between 2 and n for which x% f • • • , x* are linearly 
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dependent. (If worse comes to worst, our assumption assures us that 
k = n will do.) Then 

otiXi + • * • + a\ fcX* = 0 

for a suitable set of as (not all zero) ; moreover, whatever the a’s, we can- 
not have at = 0, for then we should have a linear dependence relation 
among x\, • • •, xt-i, contrary to the definition of k . Hence 

— a\ — at — i 

Xt = Xi d 1 x k— i ’ 

at at 

as was to be proved. This proves the necessity of our condition; sufficiency 
is clear since, as we remarked before, every set containing a linearly de- 
pendent set is itself such. 


§ 7. Bases 

Definition. A (linear) basis (or a coordinate system) in a vector space 
*0 is a set 9C of linearly independent vectors such that every vector in 
*0 is a linear combination of elements of 9 C. A vector space V is finite- 
dimensional if it has a finite basis. 

Except for the occasional consideration of examples we shall restrict 
our attention, throughout this book, to finite-dimensional vector spaces. 

For examples of bases we turn again to the spaces (P and C n . In (P, 
the set {x n }, where x n (t) = t n } n = 0, 1, 2, - • *, is a basis; every poly- 
nomial is, by definition, a linear combination of a finite number of x n . 
Moreover (P has no finite basis, for, given any finite set of polynomials, 
we can find a polynomial of higher degree than any of them; this latter 
polynomial is obviously not a linear combination of the former ones. 

An example of a basis in e n is the set of vectors x;, i = 1, ■ • *, n, defined 
by the condition that the j-th coordinate of x,- is Sy. (Here we use for 
the first time the popular Kronecker 5; it is defined by 5# = 1 if i — j and 
= 0 if i 5 * j.) Thus we assert that in <B 3 the vectors x\ — (1, 0, 0), 
x 2 = (0, 1, 0), and x 3 - (0, 0, 1) form a basis. It is easy to see that they 
are linearly independent ; the formula 

x — (£i> £2, £3) = £1^1 + £2^2 + £3^3 

proves that every x in <B 3 is a linear combination of them. 

In a general finite-dimensional vector space V, with basis {xi, • • •, x n }» 
we know that every x can be written in the form 

x — £»x t -; 

we assert that the £’s are uniquely determined by x. The proof of this 
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assertion is an argument often used in the theory of linear dependence. 
If we had x = then we should have, by subtraction, 

Jli (!i - !»,>< = o. 

Since the x»* are linearly independent, this implies that £,■ —?;,• = 0 for 
i = 1, •••,»; in other words, the f's are the same as the Vs. (Observe 
that writing {xi, * * • , x n ) for a basis with n elements is not the proper thing 
to do in case n — 0. We shall, nevertheless, frequently use this notation. 
Whenever that is done, it is, in principle, necessary to adjoin a separate 
discussion designed to cover the vector space 0. In fact, however, every- 
thing about that space is so trivial that the details are not worth writing 
down, and we shall omit them.) 

Theorem. If V is a finite-dimensional vector space and if [yu • • *, y m ) 
is any set of linearly independent vectors in *0, then, unless the y’s already 
form a basis , we can find vectors y m +\, • • •, y m+p so that the totality of the 
y’s, that is, {y u • • *, y m , y m +\, • • *, y m +v)> ** a basis. In other words, every 
linearly independent set can be extended to a basis . 

proof. Since V is finite-dimensional, it has a finite basis, say {xi, • • •, 
x n } . We consider the set S of vectors 


Vlf * * •> Vrny X U • * •> x n, 

in this order, and we apply to this set the theorem of § 6 several times in 
succession. In the first place, the set S is linearly dependent, since the 
y’s are (as are all vectors) linear combinations of the x’s. Hence some 
vector of $ is a linear combination of the preceding ones; let z be the first 
such vector. Then z is different from any y i} i — 1, ••*, m (since the 
y’s are linearly independent), so that z is equal to some x, say z = x t \ 
We consider the new set S' of vectors 


Vi, * * y mi Xi, • • *, x t - 1 , x* + i, • • •, x n . 

We observe that every vector in *U is a linear combination of vectors in 
S , since by means of y u • • •, y m xj, • • •, Xi_i we may express x t -, and 
then by means of xi, •••, Xi_x, x t *, x* +1 , • • •, x n we may express any vector. 
(The x’s form a basis.) If S' is linearly independent, we are done. If 
it is not, we apply the theorem of § 6 again and again the same way t ill 
we reach a linearly independent set containing y lt •••, y m , in terms of 
which we may express every vector in V. This last set is a basis containing 
the y’s . 
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exsbcisbs 

1. (a) Prove that the four vectors 

* = (1, 0, 0), 
y = (o, l, o), 

2 = (0, 0, 1), 

» = a, i, i), 

in & form a linearly dependent set, but any three of them are linearly independent* 
(To test the linear dependence of vectors x = (£i, £ 2 , £3), y = 0?i, 572, va), and 
2 “ (fi> f2, fa) in <2*, proceed as follows. Assume that a, ( 3 , and 7 can be found 
so that ax + ffy + yz = 0. This means that 

+ £171 + Tfi — 0, 

a %2 + #72 + 7fa = 0, 

+ Py* + 7fs — 0. 

The vectors x, y f and z are linearly dependent if and only if these equations have a 
solution other than a = /3 *# 7 = 0.) 

(b) If the vectors a;, y , 2, and « in (P are defined by x(f) = 1, y{t) = £, 2(f) = £ 2 , 
and u(£) = 1 + 1 + £ 2 , prove that x, y, 2, and u are linearly dependent, but any 
three of them are linearly independent. 

2. Prove that if (ft is considered as a rational vector space (see § 3, (8)), then a 
necessaiy and sufficient condition that the vectors 1 and £ in (ft be linearly in- 
dependent is that the real number £ be irrational. 

3. Is it true that if x, y, and z are linearly independent vectors, then so also are 
x + y,V + an d z + x? 

4. (a) Under what conditions on the scalar £ are the vectors (1 + £, 1 — £) 
and (1 — £, 1 + £) in <3 2 linearly dependent? 

(b) Under what conditions on the scalar £ are the vectors (£, 1, 0), (1, £, 1), 
and (0, 1, £) in (ft 7 8 linearly dependent? 

(c) What is the answer to (b) for Q s (in place of (ft 8 )? 

5. (a) The vectors (£1, £ 2 ) and (y it 92) in & are linearly dependent if and only if 
$1172 - £2171. 

(b) Find a similar necessary and sufficient condition for the linear dependence 
of two vectors in ©*. Do the same for three vectors in C 8 . 

(c) Is there a set of three linearly independent vectors in C 2 ? 

6. (a) Under what conditions on the scalars £ and y are the vectors (1, £) and 
(1, y) in & linearly dependent? 

(b) Under what conditions on the scalars £, 17, and f are the vectors (1, £, £*), 
(1 , v, y 2 ), and (1, f, f 2 ) in 6 3 linearly dependent? 

(c) Guess and prove a generalization of (a) and (b) to <3*. 

7. (a) Find two bases in <3 4 such that the only vectors common to both are 

(0, 0, 1, 1) and (1, 1, 0, 0). 
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(b) Find two bases in C 4 that have no vectors in common so that one of them 
contains the vectors (1, 0, 0, 0) and (1, 1, 0, 0) and the other one contains the 
vectors (1, 1, 1, 0) and (1, 1, 1, 1). 

8. (a) Under what conditions on the scalar £ do the vectors (1, 1, 1) and (1, £, £ 2 ) 
form a basis of C*? 

(b) Under what conditions on the scalar £ do the vectors (0, 1, £), (£, 0, 1), and 
(£, 1, 1 + £) form a basis of 6 8 ? 

9. Consider the set of all those vectors in 6* each of whose coordinates is either 
0 or 1 ; how many different bases does this set contain? 

10. If 9C is the set consisting of the six vectors (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1), 
(0, 1, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1) in 6 4 , find two different maximal linearly 
independent subsets of 9C. (A maximal linearly independent subset of 9C is a linearly 
independent subset of 9C that becomes linearly dependent every time that a vector 
of X that is not already in ‘y is adjoined to %) 

11. Prove that every vector space has a basis. (The proof of this fact is out of 
reach for those not acquainted with some transfinite trickery, such as well-ordering 
or Zorn’s lemma.) 


1 8. Dimension 

Theorem 1 . The number of elements in any basis of a finite-dimensional 
vector space V is the same as in any other basis . 

proof. The proof of this theorem is a slight refinement of the method 
used in § 6, and, incidentally, it proves something more than the theorem 
states. Let 9C = {x u • * •, x n ) and y = {y u •••, y m \ be two finite sets 
of vectors, each with one of the two defining properties of a basis; i.e., we 
assume that every vector in 1) is a linear combination of the z’s (but not 
that the %’s are linearly independent), and we assume that the y’s are 
linearly independent (but not that every vector is a linear combination 
of them). We may apply the theorem of § 6, just as above, to the set S 
of vectors 

Vm? * ’ *i #«* 

Again we know that every vector is a linear combination of vectors of 
S and that 8 is linearly dependent. Reasoning just as before, we obtain 
a set $' of vectors 

f/my 1> * * * f %ny 

again with the property that every vector is a linear combination of vectors 
of S'. Now we write in front of the vectors of S' and apply the same 
argument. Continuing in this way, we see that the %’s will not be exhausted 
before the y’s, since otherwise the remaining y’ s would have to be linear 
combinations of the ones already incorporated into S, whereas we knorr 
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that the y’s are linearly independent. In other words, after the argument 
has been applied m times, we obtain a set with the same property the 
x’s had, and this set differs from the set of x’s in that m of them are re- 
placed by y’s. This seemingly innocent statement is what we are after; 
it implies that n ^ m. Consequently if both 9C and y are bases (so that 
they each have both properties), then n ^ m and m ^ n. 

Definition. The dimension of a finite-dimensional vector space *0 is 
the number of elements in a basis of V. 

Observe that since the empty set of vectors is a basis of the trivial 
space 0, the definition implies that that space has dimension 0. At the 
same time the definition (together with the fact that we have already 
exhibited, in § 7, one particular basis of <3 n ) at last justifies our terminology 
and enables us to announce the pleasant result: ra-dimensional coordinate 
space is n-dimensional. (Since the argument is the same for <R n and for 
C n , the assertion is true in both the real case and the complex case.) 

Our next result is a corollary of Theorem 1 (via the theorem of § 7). 

Theorem 2. Every set ofn+ 1 vectors in an n-dimensional vector space 
V is linearly dependent. A set of n vectors in *0 is a basis if and only if it is 
linearly independent , or, alternatively , if and only if every vector in D 
is a linear combination of elements of the set. 


§ 9. Isomorphism 

As an application of the notion of linear basis, or coordinate system, 
we shall now fulfill an implicit earlier promise by showing that every 
finite-dimensional vector space over a field is essentially the same as 
(in technical language, is isomorphic to) some ff n . 

Definition. Two vector spaces 01 and V (over the same field) are 
isomorphic if there is a one-to-one correspondence between the vectors 
x of 01 and the vectors y of T), say y = T(x), such that 

T(a \X\ + <* 222 ) = <x\T(x{) + 

In other words, 01 and V are isomorphic if there is an isomorphism (such 
as T) between them, where an isomorphism is a one-to-one correspondence 
that preserves all linear relations. 

It is easy to see that isomorphic finite-dimensional vector spaces have 
the same dimension; to each basis in one space there corresponds a basis 
in the other space. Thus dimension is an isomorphism invariant; we shall 
now show that it is the only isomorphism invariant, in the sense that every 
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two vector spaces with the same finite dimension (over the same field, of 
course) are isomorphic. Since the isomorphism of < U and *U on the one 
hand, and of U and on the other hand, implies that 'll and V? are iso- 
morphic, it will be sufficient to prove the following theorem. 

Theorem. Every n-dimensional vector space *0 over afield is isomorphic 
to 3F n . 

proof. Let {xi, * • *, x n } be any basis in *0. Each x in V can be written 

in the form ftxi -1 (- ftx n , and we know that the scalars ft, • • *, ft 

are uniquely determined by x. We consider the one-to-one correspondence 

x <=* (ft, •••, ft) 

between D and SF n . If y = rnxi rj n x nj then 

ax + fiy = (aft + Pyi)Xi H b (aft + 0Vn)x n ] 

this establishes the desired isomorphism. 

One might be tempted to say that from now on it would be silly to try 
to preserve an appearance of generality by talking of the general n-di- 
mensional vector space, since we know that, from the point of view of 
studying linear problems, isomorphic vector spaces are indistinguishable, 
and, consequently, we might as well always study £ n . There is one catch. 
The most important properties of vectors and vector spaces are the ones 
that are independent of coordinate systems, or, in other words, the ones 
that are invariant under isomorphisms. The correspondence between 
and was, however, established by choosing a coordinate system; were 
we always to study (F n , we would always be tied down to that particular 
coordinate system, or else we would always be faced with the chore of 
showing that our definitions and theorems are independent of the co- 
ordinate system in which they happen to be stated. (This horrible dilemma 
will become clear later, on the few occasions when we shall be forced to 
use a particular coordinate system to give a definition.) Accordingly, 
in the greater part of this book, we shall ignore the theorem just proved, 
and we shall treat n-dimensional vector spaces as self-respecting entities, 
independently of any basis. Besides the reasons just mentioned, there is 
another reason for doing this: many special examples of vector spaces, 
such for instance as (P n , would lose a lot of their intuitive content if we were 
to transform them into e n and speak of coordinates only. In studying 
vector spaces, such as 0 P», and their relation to other vector spaces, we 
must be able to handle them with equal ease in different coordinate systems, 
or, and this is essentially the same thing, we must be able to handle them 
without using any coordinate systems at all. 
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EXERCISES 

1. (a) What is the dimension of the set 6 of all complex numbers considered 
as a real vector space? (See § 3, (9).) 

(b) Every complex vector space *0 is intimately associated with a real vector 
space the space T) _ is obtained from V by refusing to multiply vectors of V 
by anything other than real scalars. If the dimension of the complex vector space 
D is n, what is the dimension of the real vector space ~? 

2. Is the set (R of all real numbers a finite-dimensional vector space over the 
field Q of all rational numbers? (See § 3, (8). The question is not trivial; it helps 
to know something about cardinal numbers.) 

3. How many vectors are there in an n-dimensional vector space over the field 
Zp (where p is a prime)? 

4. Discuss the following assertion: if two rational vector spaces have the same 
cardinal number (i.e., if there is some one-to-one correspondence between them), 
then they are isomorphic (i.e., there is a linearity-preserving one-to-one correspond- 
ence between them). A knowledge of the basic facts of cardinal arithmetic is 
needed for an intelligent discussion. 


§ 10. Subspaccs 

The objects of interest in geometry are not only the points of the space 
under consideration, but also its lines, planes, etc. We proceed to study 
the analogues, in general vector spaces, of these higher-dimensional ele- 
ments. 

Definition. A non-empty subset tfft of a vector space *0 is a subspace 

or a linear manifold if along with every pair, x and y, of vectors contained 

in 9TC, every linear combination ax + fiy is also contained in 9fE. 

A word of warning: along with each vector x, a subspace also contains 
x — x. Hence if we interpret subspaces as generalized lines and planes, 
we must be careful to consider only lines and planes that pass through the 
origin. 

A subspace 3TC in a vector space *0 is itself a vector space; the reader 
can easily verify that, with the same definitions of addition and scalar 
multiplication as we had in *0, the set satisfies the axioms (A), (B), and (C) 
of §2. 

Two special examples of subspaces are: (i) the set 0 consisting of the 
origin only, and (ii) the whole space *U. The following examples are less 
trivial. 

(1) Let n and m be any two strictly positive integers, m ^ n. Let 9 fTC 
be the set of all vectors x — (£i, • • • , £ n ) in C n for which £i *■•••*= = 0. 
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(2) With m and n as in (1), we consider the space <P n# and any m real 

numbers im. Let 9 ft be the set of all vectors (polynomials) x in 

<P« for which x(t x ) = • • • * se(<*) = 0. 

(3) Let 9ft be the set of all vectors x in (P for which x(t) = x(— £) holds 
identically in t 

We need some notation and some terminology. For any collection 
{Oft*} of subsets of a given set (say, for example, for a collection of sub- 
spaces in a vector space V), we write f° r the intersection of all 

9ft,, i.e., for the set of points common to them all. Also, if 9ft and 91 are 
subsets of a set, we write 9ft C 91 if 9ft is a subset of 91, that is, if every ele- 
ment of 9ft lies in 91 also. (Observe that we do not exclude the possibility 
9ft = 91; thus we write VCDas well a sOCU) For a finite collection 
(9fti, • • •, 9ft n }, we shall write 9fti fl • • • fl 9 Tin in place of ft, 9ft,; in case 
two subspaces 9ft and 91 are such that 9ft fl 91 = 0, we shall say that 
9ft and 91 are disjoint 


§ 11. Calculus of subspaces 

Theorem 1. The intersection of any collection of subspaces is a subspace . 

proof. If we use an index v to tell apart the members of the collection, 
so that the given subspaces are 3ft„, let us write 

9ft = ft, 9ft,. 

Since every 9ft, contains 0, so does 9ft, and therefore 9ft is not empty. If 
x and y belong to 9ft (that is, to all 9TI„), then ax + py belongs to all 9ft,, 
and therefore 9ft is a subspace. 

To see an application of this theorem, suppose that S is an arbitrary set 
of vectors (not necessarily a subspace) in a vector space *0. There certainly 
exist subspaces 9ft containing every element of $ (that is, such that SC9H); 
the whole space *0 is, for example, such a subspace. Let 9ft be the inter- 
section of all the subspaces containing S; it is clear that 9ft itself is a sub- 
space containing S. It is clear, moreover, that 9ft is the smallest such 
subspace; if 3 is also contained in the subspace 91, § C 91, then 9ft C 91. 
The subspace 9ft so defined is called the subspace spanned by S or the span 
of S. The following result establishes the connection between the notion 
of spanning and the concepts studied in §§ 5-9. 

Theorem 2. If S is any set of vectors in a vector space *0 and if 9ft is the 
subspace spanned by S, then 9ft is the same as the set of all linear combinations 
of elements of S. 
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proof. It is clear that a linear combination of linear combinations 
of elements of S may again be written as a linear combination of elements 
of S. Hence the set of all linear combinations of elements of S is a sub- 
space containing S; it follows that this subspace must also contain 9TI. 
Now turn the argument around: 91 1 contains 8 and is a subspace; hence 911 
contains all linear combinations of elements of S. 

We see therefore that in our new terminology we may define a linear 
basis as a set of linearly independent vectors that spans the whole space. 

Our next result is an easy consequence of Theorem 2; its proof may be 
safely left to the reader. 

Theorem 3. If JC and X are any two subspaces and if 91T is the subspace 
spanned by X and X together , then 9TC is the same as the set of all vectors 
of the form x + y, with xinX and y in JC. 

Prompted by this theorem, we shall use the notation JC + X for the 
subspace 911 spanned by JC and JC. We shall say that a subspace JC of 
a vector space V is a complement of a subspace JC if JC H JC = 0 and 
JC + JC = U 


§ 12. Dimension of a subspace 

Theorem 1. A subspace 911 in an n-dimensional vector space V is a vector 

space of dimension ^ n. 

proof. It is possible to give a deceptively short proof of this theorem 
that runs as follows. Every set of n + 1 vectors in V is linearly dependent, 
hence the same is true of 911; hence, in particular, the number of elements 
in each basis of 911 is ^ n, Q.E.D. 

The trouble with this argument is that we defined dimension n by 
requiring in the first place that there exist a finite basis, and then demanding 
that this basis contain exactly n elements. The proof above shows only 
that no basis can contain more than n elements; it does not show that 
any basis exists. Once the difficulty is observed, however, it is easy to 
fill the gap. If 9ft = 0, then 9ft is O-dimensional, and we are done. If 9T l 
contains a non-zero vector x\ 9 let 9fti (c 9ft) be the subspace spanned by 
X\. If 9ft = 9Tli, then 9ft is 1-dimensional, and we are done. If 911 9fti, 
let be an element of 9TI not contained in 91Ii, and let 9 TI 2 be the sub- 
space spanned by xi and x 2 ; and so on. Now we may legitimately employ 
the argument given above; after no more than n steps of this sort, the 
process reaches an end, since (by § 8, Theorem 2) we cannot find n + 1 
linearly independent vectors. 

The following result is an important consequence of this second and 
correct proof of Theorem 1. 
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Theorem 2. Gwen any m-ditnenstondl subspace 3E in an n-dimensional 
vector space V , we can find a basis {x lf • ■ x m , x m+X} • • x n } in V so 
that x x , • • • , x m are in 911 and form, therefore, a basis of 3TI. 

We shall denote the dimension of a vector space V by the symbol dim D. 
In this notation Theorem 1 asserts that if 2(11 is a subspace of a finite-di- 
mensional vector space V , then dim 3lft g dim *0. 


EXERCISES 

1. If 2(11 and 31 are finite-dimensional subspaces with the same dimension, and 
if 31ft C 3ft, then 31ft = 3ft. 

2. If 3Tft and 3(1 are subspaces of a vector space *0, and if every vector in T) belongs 
either to 3H or to 31 (or both), then either 9TI = V or 3d = V (or both). 

3. If x, y t and z are vectors such that x + y + z = 0, then x and y span the 
same subspace as y and z . 

4. Suppose that x and y are vectors and 3TZ is a subspace in a vector space *U; 
let 3C be the subspace spanned by 31ft and x , and let 3C be the subspace spanned 
by 31ft and y. Prove that if y is in ^ but not in 31ft, then x is in X. 

5. Suppose that <£, 3flt, and 3ft are subspaces of a vector space. 

(a) Show that the equation 

«C fl (311 + 3ft) = (£ fl 311) + (<£ 0 3ft) 

is not necessarily true. 

(b) Prove that 

£ n (3E + (£ n 3i)) = (£ n 3fR) + (jc n 31). 

(a) Can it happen that a non-trivial subspace of a vector space *0 (i.e., a 
subspace different from both 0 and *U) has a unique complement? 

(b) If 31ft is an m-dimensional subspace in an n-dimensional vector space, then 
every complement of 31ft has dimension n — m. 

7. (a) Show that if both 911 and 91 are three-dimensional subspaces of a five- 
aimensional vector space, then 311 and 91 are not disjoint. 

( ) If 9K and 31 are finite-dimensional subspaces of a vector space, then 

dim 9H + dim 91 = dim (911 + 91) + dim (9E fl 91). 

5 “P* 1 U *M> = *<# identically in t (see § 10, (3)), 
and it is called odd if x(-t) - -*(<). * ’ w/ ’ 

(a) Both the class 3K of even polynomials and the class 31 of odd polynomials 
are subspaces of the space (P of all (complex) polyno mials. 

(b) Prove that 911 and 91 are each other’s complements. 
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§ 13. Dual spaces 

Definition. A linear functional on a vector space *1) is a scalar-valued 
function y defined for every vector z, with the property that (identically 
in the vectors x\ and x 2 and the scalars ai and a 2 ) 

u(<* i*l,+ « 2 x 2 ) = «i 2 /(xi) + a 2 y(x 2 ). 

Let us look at some examples of linear functionals. 

(1) For x = (£i, •••, £ n ) in e n , write y(x) = More generally, let 

<*t, be any n scalars and write 

y(x) = <*i£i H f- a»fft. 

We observe that for any linear functional y on any vector space 
2 /( 0 ) = 2/(0 * 0 ) = 0 - y ( 0 ) = 0 ; 

for this reason a linear functional, as we defined it, is sometimes called 
homogeneous. In particular in C n , if y is defined by 

y(x) = 4 h a n £ n + 0, 

then t/ is not a linear functional unless =» 0. 

(2) For any polynomial x in (P, write y{x) = x(0). More generally, 
let ai, - • -, a n be any n scalars, let t\ } • • -, be any n real numbers, and 
write 

2/0*0 = aix(fi) 4 ba n z(*»). 

Another example, in a sense a limiting case of the one just given, is obtained 
as follows. Let (a, 6) be any finite interval on the real J-axis, and let a 
be any complex-valued integrable function defined on (a, b ); define y by 

2/0*0 = f a(t)x{t) dt. 

(3) On an arbitrary vector space V, define y by writing 

2 / 0*0 = 0 

for every x in *0, 

The last example is the first hint of a general situation. Let *0 be any 
vector space and let *0' be the collection of all linear functionals on *0. 
Let us denote by 0 the linear functional defined in (3) (compare the comment 
at the end of § 4). If yi and y 2 are linear functionals on *0 and if and 
a 3 are scalars, let us write y for the function defined by 

2/0*0 - «i y x (x) + a 2 y 2 (z). 
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It is easy to see that y is a linear functional; we denote it by *\y\ + <x 2 y 2 . 
With these definitions of the linear concepts (zero, addition, scalar multi- 
plication), the set *0' forms a vector space, the dual space of V, 

§ 14. Brackets 

Before studying linear functionals and dual spaces in more detail, we 
wish to introduce a notation that may appear weird at first sight but that 
will clarify many situations later on. Usually we denote a linear functional 
by a single letter such as y. Sometimes, however, it is necessary to use 
the function notation fully and to indicate somehow that if y is a linear 
functional on V and if x is a vector in *0, then y(x) is a particular scalar. 
According to the notation we propose to adopt here, we shall not write 
y followed by x in parentheses, but, instead, we shall write x and y enclosed 
between square brackets and separated by a comma. Because of the un- 
usual nature of this notation, we shall expend on it some further verbiage. 

As we have just pointed out [x, y] is a substitute for the ordinary func- 
tion symbol y(x); both these symbols denote the scalar we obtain if we 
take the value of the linear function y at the vector x. Let us take an 
analogous situation (concerned with functions that are, however, ^not 
linear). Let y be the real function of a real variable defined for each real 
number x by y(x) » x 2 , The notation [x, y] is a symbolic way of writing 
down the recipe for actual operations performed; it corresponds to the 
sentence [take a number, and square it]. 

Using this notation, we may sum up: to every vector space *0 we make 
correspond the dual space V f consisting of all linear functionals on V; 
to every pair, x and y , where x is a vector in *0 and y is a linear functional 
in *0', we make correspond the scalar [x, y] defined to be the value of y 
at x. In terms of the symbol [x, y] the defining property of a linear func- 
tional is 

W [*1*1 + «2^2, y ] = <xi[x lf y] + a 2 [x 2 , y], 

and the definition of the linear operations for linear functionals is 

® I x ; a iVi + <*22/2] = *i[£, V \ ] + <x 2 [x> ft]. 

The two relations together are expressed by saying that [x, y] is a bilinear 
functional of the vectors xinV and y in V'. 
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EXERCISES 

1. Consider the set 6 of complex numbers as a real vector space (as in §3,(9)). 
Suppose that for each x = fa -f *fa in e (where fa and fa are real numbers and 
i — V — 1 ) the function y is defined by 

(a) y(x) = fa, 

(b) y(x) = fa, 

(c) y(x) = fa 2 , 

(d) y(z) = fa - tfa, 

(e) y(x) = V fa 2 + fa 2 . (The square root sign attached to a positive number 
always denotes the positive square root of that number.) 

In which of these cases is y a linear functional? 

2. Suppose that for each x = (fa, fa, fa) in e* the function y is defined by 

(a) y(x) = fa + fa, 

(b) y(x) = fa - fa 2 , 

(c) y(x) = fa + 1, 

(d) y(x) = fa — 2fa + 3fa. 

In which of these cases is y a linear functional? 

3. Suppose that for each x in <P the function y is defined by 

(a) y(x) x(t) dt, 

(b) y(x) = f (x(C)) ! <U, 

(c) y(x) = f t 2 x(t) dt, 

Jo 

(d) y(x) = f x(f 2 ) dt, 

Jo 

(e) y(x) = 

(f) 

In which of these cases is y a linear functional? 

4. If (ao, «i, oi 2 , • * •) is an arbitrary sequence of complex numbers, and if a; is 

an element of (P, x(t) = write t/(x) = Prove that y is an 

element of (P' and that every element of CP' can be obtained in this manner by a 
suitable choice of the a's. 

5. If y is a non-zero linear functional on a vector space *U, and if a is an arbitrary 
scalar, does there necessarily exist a vector x in *U such that [x, y] = a? 

6. Prove that if y and z are linear functionals (on the same vector space) such 
that [x, y] — 0 whenever [x, z] = 0, then there exists a scalar a such that y = az. 
(Hint: if [xq, z] t* 0, write a = [x 0f y\/[x% z].) 
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§ 15. Dual bases 

One more word before embarking on the proofs of the important theo- 
rems. The concept of dual space was defined without any reference to 
coordinate systems; a glance at the following proofs will show a super- 
abundance of coordinate systems. We wish to point out that this phenome- 
non is inevitable; we shall be establishing results concerning dimension, 
and dimension is the one concept (so far) whose very definition is given in 
terms of a basis. 

Theorem 1. If V is an n-dimensional vector space, if {x% } • • •, x n ] is a 
basis in *0, and if • • •, a n } is any set of n scalars , then there is one 
and only one linear functional y on V such that [xi, y] — oti for i = 1, 

• • • , n. 

proof. Every x in V may be written in the form x = i x x x H (- £ n x n 

in one and only one way; if y is any linear functional, then 

fr, V] = lifri, 2/H h |»[x», yl 

From this relation the uniqueness of y is clear; if [x,*, y] = a,-, then the 
value of [x, y] is determined, for every x, by [x, y] = The argument 

can also be turned around ; if we define y by 

l x t 2/] = fl«l H 1" ZnOtn, 

then y is indeed a linear functional, and [x*, y] = a 

Theorem 2. If U is an n-dimen$ional vector space and if X = [x x , 

• • •, x rt } is a basis in V, then there is a uniquely determined basis X r in 
V'lfC' = {yi, * • y n )j with the property that [x iy yf\ = 6#. Consequently 
the dual space of an n-dimensional space is n-dimensional. 

The basis 9C' is called the dual basis of 3C. 

proof. It follows from Theorem 1 that, for each j = 1, • . n, a unique 
Vj in y' can be found so that [x iy yf\ = we have only to prove that the 
set 9C' = [y l} • • *, y n ) is a basis in U'. 

In the first place, 9C' is a linearly independent set, for if we had a x y x + 
b <*nVn = 0, in other words, if 

fo «i Vi -I + a n y n ] = a t [x, y t ]-f H a n [x, y n ] = 0 

for all x, then we should have, for x ■= x { , 

® — 2/ a jl x *t Vi 1 “ 2* * o». 
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In the second place, every y in *0' is a linear combination of y x , • • • , y n . 
To prove this, write [x i} y] — a*; then, for x = &x,-, we have 

[x, y] = £jai H H £n«n. 

On the other hand 

\%j Vj] ~ 2/y] = 

so that, substituting in the preceding equation, we get 

[x, y] = oti[x f yi\ 4 h o»[x, ?/n] 

« [x, «i2/i H h a n y»]. 

Consequently y = 0:12/1 d (- a n ^ n , and the proof of the theorem is 

complete. 

We shall need also the following easy consequence of Theorem 2. 

Theorem 3. If u and v are any two different vectors of the n-dimensional 
vector space *0, then there exists a linear functional y on V such that [u, y] 
7* [t>, y]; or, equivalently, to any non-zero vector x in *U there corresponds 
a y inV f such that [x, y ] 5^ 0 . 

proof. That the two statements in the theorem are indeed equivalent 
is seen by considering x = u — v. We shall, accordingly, prove the latter 
statement only. 

Let 9 C = {xi, • • •, x n ) be any basis in V, and let 9 C' = \y x , • • *, y n ) be 
the dual basis in *0'. If x — 2 * then (as above) [x, yf[ = Hence 
if [x, y] - 0 for all y, and, in particular, if [x, yf\ = 0 for j = 1, • • *, n, 
then x ** 0. 


§ 16 . Reflexivity 

It is natural to think that if the dual space V' of a vector space V, and 
the relations between a space and its dual, are of any interest at all for 
V, then they are of just as much interest for *0'. In other words, we propose 
now to form the dual space (t)')' of *0'; for simplicity of notation we shall 
denote it by *0". The verbal description of an element of *0" is clumsy: 
such an element is a linear functional of linear functionals. It is, however, 
at this point that the greatest advantage of the notation [x, y] appears; 
by means of it, it is easy to discuss D and its relation to *0". 

If we consider the symbol [x, y] for some fixed y = y 0 , we obtain nothing 
new: [x, j/ 0 ] is merely another way of writing the value y 0 (x) of the function 
y 0 at the vector x. If, however, we consider the symbol [x, y] for some 
fixed x = x 0 , then we observe that the function of the vectors in V', whose 
value at y is [x 0 , t/], is a scalar-valued function that happens to be linear 
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(see § 14, (2)); in other words, [z 0 , y] defines a linear functional on *0 f , 
and, consequently, an element of *0". 

By this method we have exhibited some linear functionals on *U'; have 
we exhibited them all? For the finite-dimensional case the following theo- 
rem furnishes the affirmative answer. 

Theorem. If TJ is a finite-dimensional vector space, then corresponding 

to every linear functional z 0 on D' there is a vector z 0 in V such that z 0 (y) 

= [*o, y\ ~ y( x o) for every y in V f ; the correspondence z 0 <=* x 0 between 

V” and 1) is an isomorphism . 

The correspondence described in this statement is called the natural 
correspondence between and V- 

proof. Let us view the correspondence from the standpoint of going 
from V to V”; in other words, to every z 0 in V we make correspond a 
vector z 0 in D" defined by z 0 (y) *= y(x 0 ) for every y in V'. Since [x, y] 
depends linearly on x , the transformation x Q — > z 0 is linear. 

We shall show that this transformation is one-to-one, as far as it goes. 
We assert, in other words, that if x\ and x 2 are in V, and if z x and z 2 are 
the corresponding vectors in V" (so that z x (y) = fo, y] and z 2 {y) — [x 2 , y] 
for all y in *0'), and if z x = z 2y then %\ « x 2 . To say that z x = z 2 means 
that [#i, y ] = [x 2} y] for every y in W; the desired conclusion follows from 
§ 15, Theorem 3. 

The last two paragraphs together show that the set of those linear 
functionals z on X)' (that is, elements of *0") that do have the desired form 
(that is, z(y) is identically equal to [x, y] for a suitable x in V) is a subspace 
of u" which is isomorphic to X) and which is, therefore, w-dimensional. 
But the ^-dimensionality of *0 implies that of X)', which in turn implies 
that is n-dimensional. It follows that X)" must coincide with the 
n-dimensional subspace just described, and the proof of the theorem is 
complete. 

It is important to observe that the theorem shows not only that X) and 
V" are isomorphic— this much is trivial from the fact that they have the 
same dimension — but that the natural correspondence is an isomorphism. 
This property of vector spaces is called reflexivity; every finite-dimensional 
vector space is reflexive. 

It is frequently convenient to be mildly sloppy about V": for finite- 
dimensional vector spaces we shall identify V" with V (by the natural 
isomorphism), and we shall say that the element z 0 of *0" is the same as 
the element x 0 of V whenever z 0 (y) = [x 0 , y] for all y in V'. In this language 
it is very easy to express the relation between a basis SC, in V, and the dual 
basis of its dual basis, in V the symmetry of the relation [x,-, y f ] = 5 ,-y 
shows that SC" “ SC. 
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§ 17. Annihilators 

Definition. The annihilator S° of any subset S of a vector space V 
(S need not be a subspace) is the set of all vectors y in V' such that 
[x, y\ is identically zero for all x in S. 

Thus 0° = *0' and *0° = 0 (c V). If *0 is finite-dimensional and S 
contains a non-zero vector, so that S ^ 0, then § 15, Theorem 3 shows 
that S° ^ V. 

Theorem 1. If 3TC is an m-dimensional subspace of an n-dimensional 
vector space V, then 311° is an {n — m)-dimensional subspace of V'. 

proof. We leave it to the reader to verify that 3TC° (in fact S°, for an 
arbitrary S) is always a subspace; we shall prove only the statement con- 
cerning the dimension of 3Tl°. 

Let 9C = {xi, ••■,*») bea basis in V whose first m elements are in 311 
(and form therefore a basis for 3TI); let SC' = { Z/i , •••, V » ! be the dual 
basis in T)'. We denote by 31 the subspace (in *0') spanned by y m +i, • • • , Vn) 
clearly 31 has dimension n — m. We shall prove that 3E° = 31. 

If x is any vector in 311, then x is a linear combination of xj, • • • , x m , 

* = - i &*»> 

and, for any i = m + 1, •••,«, we have 

[x, yj] = $,[x<, yf[ = 0. 

In other words, y, is in 3E° for j = m + 1, • • •, n; it follows that 31 is 
contained in 3R°, 

31 C 3Tl°. 

Suppose, on the other hand, that y is any element of 3Tt°. Since y, being 
in V', is a linear combination of the basis vectors y\, • • • , y n , we may write 

y = £"-i Wi- 

Since, by assumption, y is in 911°, we have, for every i = 1, * * *, m f 

o = fa, y] = Z>-i nfaif vA * nii 

in other words, y is a linear combination of y m + i, * * *> 2/»* Tbis proves 
that y is in 91, and consequently that 

SIX 0 <Z 01, 


and the theorem follows. 
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Theorem 2. If TO is a subspace in a finite-dimendvnal vector space V, 
then TO 00 (=(TO°)°) = 311- 

proof. Observe that we use here the convention, established at the 
end of § 16, that identifies V and V". By definition, TO 00 is the set of all 
vectors x such that [a;, y] = 0 for all y in TO 0 . Since, by the definition o l 
TO 0 , y] *= 0 for all x in TO and all y in TO 0 , it follows that TO C TO 00 . 
The desired conclusion now follows from a dimension argument. Let 
TO be m-dimensional; then the dimension of TO 0 is n — m, and that of TO 00 
is n — (n — m) = m. Hence TO = TO 00 , as was to be proved. 


exercises 

1. Define a non-zero linear functional y on & such that if xj = (1, 1, 1) and 
x% =* (1, 1, -1), then [x h y] « [x 2 , y] = 0. 

2. The vectors xi = (1, 1, 1), x 2 = (1, 1, —1), and x 3 = (1, —1, —1) form a 
basis of e 8 . If { t/i, y 2 , is the dual basis, and if x = (0, 1, 0), find [x, y[], [x, y 2 ], 
and [x, 2 / 3 ]. 

3. Prove that if y is a linear functional on an n-dimensional vector space V, 
then the set of all those vectors x for which [x, y) = 0 is a subspace of *U ; what is 
the dimension of that subspace? 

4. If y(x) = £1 + £ 2 + £3 whenever x = (£ 1 , £ 2 , £3) is a vector in 6 3 , then y 
is a linear functional on <B 3 ; find a basis of the subspace consisting of all those 
vectors x for which [x, y] = 0 . 

5. Prove that if m < n, and if 2/1, • • • , y m are linear functionals on an n-di- 
mensional vector space *0, then there exists a non-zero vector x in U such that 
fo Vft = 0 for j = 1, • • m. What does this result say about the solutions of 
linear equations? 

6. Suppose that m <n and that 2 / 1 , • • • , ym are linear functionals on an n~ 
dimensional vector space U. Under what conditions on the scalars ct h a m 
is it true that there exists a vector x in *0 such that [x, t/y] = otj for j = 1, • • m? 
What does this result say about the solutions of linear equations? 

7. If *0 is an n-dimensional vector space over a finite field, and if 0 ^ m ^ n 
then the number of m-dimensional subspaces of U is the same as the number 
of (n — m)-dimensional subspaces. 

8. (a) Prove that if S is any subset of a finite-dimensional vector space, then 
S 00 coincides with the subspace spanned by S. 

(b) If S and 3 are subsets of a vector space, and if S C 3, then 3° C S°. 

(c) If TO and TO are subspaces of a finite-dimensional vector space, then (TO fl 9l)° 
== TO 0 + TO 0 and (TO + TO) 0 = TO 0 D TO 0 . (Hint: make repeated use of (b) and of 
§ 17, Theorem 2.) 

(d) Is the conclusion of (c) valid for not necessarily finite-dimensional vector 
spaces? 
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9. This exercise is concerned with vector spaces that need not be finite-dimen- 
sional; most of its parts (but not all) depend on the sort of transfinite reasoning 
that is needed to prove that every vector space has a basis (cf. § 7, Ex. 11). 

(a) Suppose that / and g are scalar-valued functions defined on a set 9C; if a 
and are scalars write h = af + fig for the function defined by h(x) = af(x) + 
/ 3g(x ) for all x in 9C. The set of all such functions is a vector space with respect to 
this definition of the linear operations, and the same is true of the set of all finitely 
non-zero functions. (A function / on X is finitely non-zero if the set of those elements 
x of 9C for which /(x) ^ 0 is finite.) 

(b) Every vector space is isomorphic to the set of all finitely non-zero functions 
on some set. 

(c) If X) is a vector space with basis 9 C, and if / is a scalar- valued function defined 
on the set 9C, then there exists a unique linear functional y on V such that [x, y] 
= fix) for all x in 3C. 

(d) Use (a), (b), and (c) to conclude that every vector space U is isomorphic to 
a subspace of X)'. 

(e) Which vector spaces are isomorphic to their own duals? 

(f) If ‘y is a linearly independent subset of a vector space X), then there exists 
a basis of X) containing <y. (Compare this result with the theorem of § 7.) 

(g) If 9C is a set and if y is an element of 9C, write /„ for the scalar-valued function 
defined on 9C by writing f y (x) = 1 or 0 according as x = y or x 9^ y. Let *y be the 
set of all functions f v together with the function g defined by g{x) = 1 for all x 
in 9C. Prove that if 9C is infinite, then <y is a linearly independent subset of the 
vector space of all scalar-valued functions on 9C. 

(h) The natural correspondence from U to X)" is defined for all vector spaces 
(not only for the finite-dimensional ones) ; if xq is in X), define the corresponding 
element zq of X)" by writing z Q (y) = [xo, y] for all y in X)'. Prove that if X) is reflexive 
(i.e., if every z 0 in X)" can be obtained in this manner by a suitable choice of x 0 ), 
then V is finite-dimensional. (Hint: represent U' as the set of all scalar- valued 
functions on some set, and then use (g), (f), and (c) to construct an element of X)" 
that is not induced by an element of X).) 

Warning: the assertion that a vector space is reflexive if and only if it is finite- 
dimensional would shock most of the experts in the subject. The reason is that 
the customary and fruitful generalization of the concept of reflexivity to infinite- 
dimensional spaces is not the simple-minded one given in (h). 


§ 18. Direct sums 

We shall study several important general methods of making new vector 
spaces out of old ones; in this section we begin by studying the easiest one. 

Definition. If XI and X) are vector spaces (over the same field), their 
direct sum is the vector space W (denoted by XI © X)) whose elements 
are all the ordered pairs (x, y) with x in XI and y in X), with the linear 
operations defined by 

Vi) + « 2 <Z 2 , yz) = (ciiXi + a 2 x 2 , onyi + a 2 y 2 ). 

We observe that the formation of the direct sum is analogous to the way 
in which the plane is constructed from its two coordinate axes. 
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We proceed to investigate the relation of this notion to some of our 
earlier ones. 

The set of all vectors (in VP) of the form (x, 0) is a subspace of VP; the 
correspondence (x, 0) ?=* x shows that this subspace is isomorphic to 01. 
It is convenient, once more, to indulge in a logical inaccuracy and, identify- 
ing x and (x, 0), to speak of 01 as a subspace of VP. Similarly, of course, 
the vectors y of V may be identified with the vectors of the form (0, y) 
in VP } and we may consider V as a subspace of VP. This terminology 
is, to be sure, not quite exact, but the logical difficulty is much easier to 
get around here than it was in the case of the second dual space. We could 
have defined the direct sum of 01 and V (at least in the case in which 01 
and V have no non-zero vectors in common) as the set consisting of all 
x’s in 01, all y’s in *0, and all those pairs (x, y) for which and y ^ 0. 
This definition yields a theory analogous in every detail to the one we 
shall develop, but it makes it a nuisance to prove theorems because of the 
case distinctions it necessitates. It is clear, however, that from the point 
of view of this definition 0L is actually a subset of 01 ©T). In this sense 
then, or in the isomorphism sense of the definition we did adopt, we raise 
the question: what is the relation between 01 and V when we consider these 
spaces as subspaces of the big space *W? 

Theorem. If 01 and *0 are subspaces of a vector space VP } then the following 
three conditions are equivalent . 

(1) vp = 01 © V. 

(2) 01 fl V = 0 and 01 + *0 = (i.e., 01 and V are complements of 

each other). 

(3) Every vector z in VP may be written in the form z — x + y, with 
x in 01 and y in *U, in one and only one way . 

proof. We shall prove the implications (1) => (2) =* (3) => (1). 

(1) => (2). We assume that VP = 01 © 0). If z = (x, y) lies in both 
01 and t), then x «= y = 0, so that z — 0; this proves that 01 fl V — 0. 
Since the representation z — (x, 0) + (0, y) is valid for every z, it follows 
also that oi + V = VP. 

(2) =» (3). If we assume (2), so that, in particular, 01 + *0 = VP } then 
it is clear that every z in 'W has the desired representation, z = x + y. 
To prove uniqueness, we assume that z — x x + yi and z = x 2 + 2 / 2 , with 
Xi and x 2 in 01 and 2/1 and y 2 in V. Since x x + y\ = x 2 + y 2 , it follows 
that Xi — z 2 — V 2 “ Vi* Since the left member of this last equation is 
in 01 and the right member is in V , the disjointness of 01 and V implies 
that X\ = x 2 and y\ = y 2 . 

(3) => (1). This implication is practically indistinguishable from the 
definition of direct sum. If we form the direct sum 01 ® V, and then 
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identify (x, 0) and (0, y) with x and y respectively, we are committed to 
identifying the sum (x, y) — (x, 0) + (0, y) with what we are assuming 
to be the general element z = x + y of W; from the hypothesis that the 
representation of z in the form x + y is unique we conclude that the cor- 
respondence between (x, 0) and x (and also between (0, y) and y ) is one-to- 
one. 

If two subspaces % and t> in a vector space are disjoint and span 
•W (that is, if they satisfy (2)), it is usual to say that *W is the internal 
direct sum of *U and V ; symbolically, as before, = <11 © 1). If we want 
to emphasize the distinction between this concept and the one defined 
before, we describe the earlier one by saying that is the external direct 
mm of at and U. In view of the natural isomorphisms discussed above, 
and, especially, in view of the preceding theorem, the distinction is more 
pedantic than conceptual. In accordance with our identification conven- 
tion, we shall usually ignore it. 


§ 19. Dimension of a direct sum 

What can be said about the dimension of a direct sum? If <11 is n-di- 
mensional, V is m-dimensional, and W = *11 © 1), what is the dimension 
of *W? This question is easy to answer. 

Theorem 1. The dimension of a direct mm is the mm of the dimensions 

of its summands. 

proof. We assert that if {xi, • • • , x n } is a basis in % and if {y\ } • * • , y m \ 
is a basis in *0, then the set {xi, *, x n, Vu •••> Vm] (or, more precisely, 
the set {<»i, 0), • • •, (x n , 0>, <0, y x ), •••, (0, y m ») is a basis in W. The 
easiest proof of this assertion is to use the implication (1) => (3) from 
the theorem of the preceding section. Since every z in W may be written 
in the form z = x + y, where a; is a linear combination of x u ■ ■■, x n and 
y is a linear combination of y\, • • •, y m , it follows that our set does indeed 
span *W. To show that the set is also linearly independent, suppose that 

aiXi d b a n x„ + fiiyi d b p m ym = 0. 

The uniqueness of the representation of 0 in the form x + y implies that 

«lXi d b «nX» = /SlJ/l d b PmVm = 0, 

an d hence the linear independence of the x’a and of the y’a implies that 

<*i » ■ • • - a, = / 3i = • • • “ &» — 0. 
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Theorem 2. If W is any (n + m)-dimensional vector space, and if <11 
is any n-dimensional subspace of W, then there exists an m-dimensional 
subspace V in W such that W = % ® V. 

proof. Let [xi, • • *, x n } be any basis in *11; by the theorem of § 7 we 
may find a set \y u * • •, y m ) of vectors in W with the property that {.rj, 
* • x n, Vu • • •, Vm\ is a basis in *W. Let *0 be the subspace spanned by 
Vu • • •> Vm ; we omit the verification that = *U ® *0. 

Theorem 2 says that every subspace of a finite-dimensional vector space 
has a complement. 


§ 20. Dual of a direct sum 

In most of what follows we shall view the notion of direct sum as defined 
for subspaces of a vector space *0 ; this avoids the fuss with the identification 
convention of § 18, and it turns out, incidentally, to be the more useful 
concept for our later work. We conclude, for the present, our study of 
direct sums, by observing the simple relation connecting dual spaces, 
annihilators, and direct sums. To emphasize our present view of direct 
summation, we return to the letters of our earlier notation. 

Theorem. If 3ft and 31 are subspaces of a vector space T>, and if V = 3ft 

0 31, then 3ft' is isomorphic to 31° and 31' to 3ft°, and V' = 3ft° © 31°. 

proof. To simplify the notation we shall use, throughout this proof, 
x, x\ and x° for elements of 3ft, 3ft', and 3ft°, respectively, and we reserve, 
similarly, the letters y for 31 and z for V. (This notation is not meant to 
suggest that there is any particular relation between, say, the vectors 
x in 3ft and the vectors x ' in 3ft'.) 

If z' belongs to both 3ft° and 31°, i.e., if z'(x) = z'(y ) = 0 for all x and 
V> then z'(z) = z'(x + y) =0 for all z; this implies that 3ft° and 31° are 
disjoint. If, moreover, z' is any vector in *0', and if z = x + y f we write 
£°(z) = z'{y) and y®(z) — z f {x). It is easy to see that the functions x° 
and y° thus defined are linear functionals on V (i.e., elements of *0') belong- 
ing to 3ft° and 31° respectively; since z' — x° + y°, it follows that *0' is 
indeed the direct sum of 3ft° and 31°. 

To establish the asserted isomorphisms, we make correspond to every 
x° a y' in 31' defined by y'(y) = x°(y). We leave to the reader the routine 
verification that the correspondence x° — > y' is linear and one-to-one, 
and therefore an isomorphism between 3ft° and 31'; the corresponding 
result for 31° and 3ft' follows from symmetry by interchanging x and y. 
(Observe that for finite-dimensional vector spaces the mere existence of 
an isomorphism between, say, 311° and 31' is trivial from a dimension argu- 
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ment; indeed, the dimensions of both 911° and 9 ft' are equal to the dimension 
of 31.) 

We remark, concerning our entire presentation of the theory of direct 
sums, that there is nothing magic about the number two; we could have 
defined the direct sum of any finite number of vector spaces, and we could 
have proved the obvious analogues of all the theorems of the last three 
sections, with only the notation becoming more complicated. We serve 
warning that we shall use this remark later and treat the theorems it implies 
as if we had proved them. 


EXERCISES 

1. Suppose that x, y , u } and v are vectors in <3 4 5 ; let 91Z and 9i be the subspaces of 

C 4 spanned by {x } y} and [u, v) respectively. In which of the following cases is it 

true that 6 4 = 9TC 091? 

(a) a; - (1, 1, 0, 0), y = (1, 0, 1, 0) 

w = (0, 1, 0, 1), v « (0, 0, 1, 1). 

(b) x = (-1, 1, 1, 0), y - (0, 1, -1, 1) 

u = (1, 0, 0, 0), v = (0, 0, 0, 1). 

(c) x - (1, 0, 0, 1), y - (0, 1, 1, 0) 

u = (1, 0, 1, 0), t; = (0, 1, 0, 1). 

2. IfSTC is the subspace consisting of all those vectors (£i, £«, £ n +i> •••> 

f 2n ) in & n for which & = . • * = = 0, and if 91 is the subspace of all those 

vectors for which = £»+j, j — 1, then G 2n = 911091. 

3. Construct three subspaces 91Z, 9li, and 912 of a vector space *0 so that 9TC 0 9li 
=r gft® $l 2 — but 9li 7 *~ 912. (Note that this means that there is no cancellation 
law for direct sums.) What is the geometric picture corresponding to this situation? 

4. (a) If *11, D, and W are vector spaces, what is the relation between *U®{V 
®*W) and (010 D)0W (i.e., in what sense is the formation of direct sums an 
associative operation)? 

(b) In what sense is the formation of direct sums commutative? 

5. (a) Three subspaces £, 9TC, and 91 of a vector space V are called independent 
if each one is disjoint from the sum of the other two. Prove that a necessary and 
sufficient condition for *U = £ 0 (911 0 91) (and also for *1) = (£ ® 911) © 91) is that 
£, 911, and 91 be independent and that *0 = £ + 9E + 91. (The subspace £ + W 
+ 91 is the set of all vectors of the form x + y + z, with x in £, y in 9ft, and 
2 in 91.) 

(b) Give an example of three subspaces of a vector space *0, such that the sum 
of all three is V, such that every two of the three are disjoint, but such that the 
three are not independent. 

(c) Suppose that x, y, and z are elements of a vector space and that £ f 911. and 
91 are the subspaces spanned by x, y, and 2, respectively. Prove that the vectors 
x , y, and z are linearly independent if and only if the subspaces £, 911, and 91 are 
independent. 

(d) Prove that three finite-dimensional subspaces are independent if and only 
if the sum of their dimensions is equal to the dimension of their sum. 

(e) Generalize the results (a)-(d) from three subspaces to any finite number. 
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§ 21. Quotient spaces 

We know already that if 911 is a subspace of a vector space *0, then there 
are, usually, many other subspaces 91 in U such that 911 © 91 = U There 
is no natural way of choosing one from among the wealth of complements 
of 9K. There is, however, a natural construction that associates with 9TI 
and *0 a new vector space that, for all practical purposes, plays the role of 
a complement of 91L The theoretical advantage that the construction has 
over the formation of an arbitrary complement is precisely its “natural” 
character, i.e., the fact that it does not depend on choosing a basis, or, for 
that matter, on choosing anything at all. 

In order to understand the construction it is a good idea to keep a picture 
in mind. Suppose, for instance, that V = (Si 2 (the real coordinate plane) 
and that 311 consists of all those vectors (£j, £ 2 ) for which £2 = 0 (th e hori- 
zontal a as). Each complement of 9TC is a line (other than the horizontal 
axis) through the origin. Observe that each such complement has the 
property that it intersects every horizontal line in exactly one point. The 
idea of the construction we shall describe is to make a vector space out of 
the set of all horizontal lines. 

We begin by using 9TC to single out certain subsets of V. (We are back 
in the general case now.) If x is an arbitrary vector in *0, we write x + 911 
for the set of all sums x + y with y in 9K; each set of the form x + Oil is 
called a coset of 31t. (In the case of the plane-line example above, the co- 
sets are the horizontal lines.) Note that one and the same coset can arise 
from two different vectors, i.e., that even if x y, it is possible that 
x + 9K = y + 911. It makes good sense, just the same, to speak of a 
coset, say X, of 9TC, without specifying which element (or elements) X 
comes from; to say that X is a coset (of 911) means simply that there is at 
least one x such that X = x + 911. 

If X and X are cosets (of 911), we write X + X for the set of all sums 
u + v with u in X and v in X; we assert that X + X is also a coset of 9K. 
Indeed, if X = x + 3TC and X = y 4- git, then every element of X + X 
belongs to the coset (x + y) + 911 (note that 911 + 911 = 911), and, con- 
versely, every element of (x + y) + 9Tt is in X + X. (If, for instance, z 
is in 9Tt, then (x + y) + z = (x + z) + (y + 0).) In other words, X + X 
= (x 4- y) + 31Z, so that X + X is a coset, as asserted. We leave to the 
reader the verification that coset addition is commutative and associative. 
The coset 9R (i.e., 0 + 9IZ) is such that X + 9TC = X for every coset X, 
and, moreover, 9TI is the only coset with this property. (If (x + 3U) 
+ (y + 911) = x -f- 9TC, then x + 9TC contains x + y, so that x + y = x + u 
for some u in 9TC; this implies that y is in 911, and hence that y + 9TC = 911.) 
If X is a coset, then the set consisting of all the vectors — u, with u in X, 
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is itself a coset, which we shall denote by -3C. The coset -JC is such 
that 3C + (-3C) = 911, and, moreover, — 3C is the only coset with this 
property. To sum up: the addition of cosets satisfies the axioms (A) of 
§2. 

If 5C is a coset and if a is a scalar, we write a3C for the set consisting of 
all the vectors otu with u in 3C in case a^O; the coset 0 • 3C is defined to be 
£fH. A simple verification shows that this concept of multiplication satisfies 
the axioms (B) and (C) of § 2. 

The set of all cosets has thus been proved to be a vector space with respect 
to the linear operations defined above. This vector space is called the 
quotient space of *0 modulo Oil; it is denoted by U/9TC. 

§ 22. Dimension of a quotient space 

Theorem 1. If 9H and 91 are complementary subspaces of a vector space 

*0, then the correspondence that assigns to each vector y in 91 the coset y + 9TC 

is an isomorphism between 91 and *0/911. 

proof. If yi and y 2 are elements of 91 such that yi + 911 = y 2 + 9H, 
then, in particular, yi belongs to y 2 + 9 ft, so that yi = y 2 + x for some 
x in 9TC. Since this means that y x — y 2 — x, and since 3TC and 91 are dis- 
joint, it follows that x = 0, and hence that y x = y 2 . (Recall that y x — y 2 
belongs to 91 along with y x and y 2 .) This argument proves that the cor- 
respondence we are studying is one-to-one, as far as it goes. To prove that 
it goes far enough, consider an arbitrary coset of 9TI, say z + 9TL Since 
*0 = gqr + gil, we may write z in the form y + x, with x in 9H and y in 91; 
it follows (since x + 9TC = 9TI) that z + 9TI = y + 9H. This proves that 
every coset of 9TI can be obtained by using an element of 91 (and not just 
any old element of *0) ; consequently y — ■» y + 9R is indeed a one-to-one 
correspondence between 91 and *D/9Tt. The linear property of the cor- 
respondence is immediate from the definition of the linear operations in 
•U/9TC; indeed, we have 

(ai y x + a 2 y 2 ) + 9IZ = ct\{y\ + 9TI) + a 2 (y 2 + 9TI). 

Theorem 2. If 9TC is an m-dimensional subspace of an n-dimensional 

vector space *0, then *U/9H has dimension n — m, 

proof. Use § 19, Theorem 2 to find a subspace 91 so that 9TZ © 91 = *0. 
The space 91 has dimension n — m (by § 19, Theorem 1), and it is isomor- 
phic to *0/9Tl (by Theorem 1 above). 

There are more topics in the theory of quotient spaces that we could 
discuss (such as their relation to dual spaces and annihilators). Since, 
however, most such topics are hardly more than exercises, involving the 
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use of techniques already at our disposal, we turn instead to some new and 
non-obvious ways of manufacturing useful vector spaces. 


EXERCISES 

1. Consider the quotient spaces obtained by reducing the space (P of polynomials 
modulo various subspaces. If 971 « (P n , is (P/971 finite-dimensional? What if 971 
is the subspace consisting of all even polynomials? What if 971 is the subspace 
consisting of all polynomials divisible by x n (where x n (t) - t n )? 

2. If S and 3 are arbitrary subsets of a vector space (not necessarily cosets of a 
subspace), there is nothing to stop us from defining S + 3 just as addition was 
defined for cosets, and, similarly, we may define a$ (where a is a scalar). If the 
class of all subsets of a vector space is endowed with these “linear operations,” 
which of the axioms of a vector space are satisfied? 

3. (a) Suppose that 9ft is a subspace of a vector space *0. Two vectors x and y 
of V are congruent modulo 971, in symbols x s y (9ft), if x — y is in 9ft. Prove that 
congruence modulo 9ft is an equivalence relation, i.e., that it is reflexive (x s x), 
symmetric (if x = y, then y s x), and transitive (if x ^ y and y s z, then x = z). 

(b) If ai and a 2 are scalars, and if Xi, x 2 , y\, and y 2 are vectors such that xi » y\ 
(9ft) and x 2 as y 2 (9ft) , then otiXi + <* 2 X 2 = «iyi + « 2 y 2 (9TC)- 

(c) Congruence modulo 9ft splits *0 into equivalence classes, i.e., into sets such 
that two vectors belong to the same set if and only if they are congruent. Prove 
that a subset of *0 is an equivalence class modulo 9ft if and only if it is a coset of 9ft. 

4. (a) Suppose that 9ft is a subspace of a vector space *0. Corresponding to 

every linear functional y on *0/911 (i.e., to every element y of (U/9ft)'), there is a 
linear functional z on *0 (i.e., an element of *0') ; the linear functional z is defined 
by z{x) = y(x + 9ft). Prove that the correspondence y — > z is an isomorphism 
between (*0/971)' and 9ft°. , 

(b) Suppose that 9ft is a subspace of a vector space *0, Corresponding to every 
coset y + 9ft° of 9ft° in *0' (i.e., to every element 3C of *07971°), there is a linear 
functional z on 9ft (i.e., an element z of 971'); the linear functional z is defined by 
z(x) ~ y{x). Prove that z is unambiguously determined by the coset 3C (that is, 
it does not depend on the particular choice of y), and that the correspondence 
3C — > z is an isomorphism between { U'/9ft° and 9ft'. 

5. Given a finite-dimensional vector space *0, form the direct sum *W = *0 ® *0', 
and prove that the correspondence (x, y) — ► (y, x) is an isomorphism between 
*W and *W'. 


§ 23. Bilinear forms 

If % and *0 are vector spaces (over the same field), then their direct sum 
W = qi © *0 is another vector space; we propose to study certain functions 
on *W\ (For present purposes the original definition of at 0 *0, via ordered 
pairs, is the convenient one.) The value of such a function, say w, at an 
element (x, y) of *W will be denoted by w(x, y). The study of linear func- 
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tions on *W is no longer of much interest to us; the principal facts con- 
cerning them were discussed in § 20. The functions we want to consider 
now are the bilinear ones; they are, by definition, the scalar-valued func- 
tions on *W with the property that for each fixed value of either argument 
they depend linearly on the other argument. More precisely, a scalar- 
valued function w on V? is a bilinear form (or bilinear functional) if 

w{cciXi + a 2 x 2 , y ) = aiw(x lt y) + ot 2 w(x 2 , y) 

and 

w(x, aiyi + a 2 y 2 ) = «i w(x, y{) + a 2 w(x, y 2 ), 

\ 

identically in the vectors and scalars involved. 

In one special situation we have already encountered bilinear functionals. 
If, namely, *0 is the dual space of % V - Hi', and if we write w(x, y) = [x, y] 
(see § 14), then w is a bilinear functional on <11 © 'll'. For an example in 
a more general situation, let ni and *0 be arbitrary vector spaces (over the 
same field, as always), let u and v be elements of m' and *0' respectively, 
and write w(x, y) — u{x)v{y) for all x in <11 and y in T). An even more 
general example is obtained by selecting a finite number of elements in 
0l', say Ui, • • •, Uk, selecting the same finite number of elements in *0', 

say Vi, •••,!>*, and writing w(x , y) = u x (x)v x (y) H b Uk(x)v k (y). Which 

of the words, “functional” or “form,” is used depends somewhat on the 
context and, somewhat more, on the user’s whim. In this book we shall 
generally use “functional” with “linear” and “form” with “bilinear” (and 
its higher-dimensional generalizations). 

If w x and w 2 are bilinear forms on and if a x and a 2 are scalars, we 
write w for the function on defined by 

w(x, y) = a x w x (x, y) + a 2 w 2 (x f y). 

It is easy to see that it? is a bilinear form; we denote it by a x w x + a 2 w 2 . 
With this definition of the linear operations, the set of all bilinear forms 
on is a vector space. The chief purpose of the remainder of this section 
is to determine (in the finite-dimensional case) how the dimension of this 
space depends on the dimensions of 'll and *0. 

Theorem 1. If "U is an n-dimensional vector space with basis {x X) • • • , x n ] f 
if V is an m-dimensional vector space with basis [y x , •• *, y m \ f and if 
is any set of nm scalars (i = 1, • • •, n; j — 1, • • •, m), then there is 
one and only one bilinear form w on % i ®V such that w(x iy yf) = for 
all i and j . 

proof. If x = ZiX iy y = and w is a bilinear form on © U 

such that w(xi, yf) = <*»/, then 

w(x, y) — Sy €ww(*<, vf) = S< S/ 
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From this equation the uniqueness of to is clear; the existence of a suitable 
w is proved by reading the same equation from right to left, that is, de- 
fining w by it. (Compare this result with § 15, Theorem 1.) 

Theorem 2. If *11 is an n-dimensional vector space with basis \x\ y * • • , x n } , 
and if V is an m-dimensional vector space with basis {t/i, • • •, y m ], then 
there is a uniquely determined basis {w pq } (p = 1 , • • •, n; q = 1 , • • m) 
in the vector space of all bilinear forms on *11 ® U with the property that 
w pq (xi , Xj) = biphjq. Consequently the dimension of the space of bilinear 
forms on C IL ® *0 is the product of the dimensions of *11 and V. 


proof. Using Theorem 1, we determine w pq (for each fixed p and q) 
by the given condition w pq (xi } yf) — 8 ip dj q . The bilinear forms so de- 
termined are linearly independent, since 


implies that 


Xp Xff CCpqWpq — 0 

Q = a p«^tpfy$ = 


If, moreover, w is an arbitrary element of and if wfa, yf) = a#, then 
w = Jfq 0L P qW P q. Indeed, if x = and V “ X; Wh then 

Wpq(x, y) = Jfi 'Effajtiptjq = £pVq, 

and, consequently, 

w{x f y) = Xi = Xp .X* y). 

It follows that the w pq form a basis in the space of bilinear forms; this 
completes the proof of the theorem. (Compare this result with § 15, 
Theorem 2.) 


EXERCISES 

1. (a) If w is a bilinear form on (R n ® (R n , then there exist scalars ay, i,j = 

n, such that if x = (fi, • • • , £„) and y = (771, • • *, rj n ) f then w(x, y ) = VS* 

The scalars a,/ are uniquely determined by w. 

(b) If z is a linear functional on the space of all bilinear forms on (ft n ® (R n , then 
there exist scalars /J# such that (in the notation of (a)) z(w) = for 

every w . The scalars /3,y are uniquely determined by z. 

2. A bilinear form w on *11® *0 is degenerate if, as a function of one of its two 
arguments, it vanishes identically for some non-zero value of its other argument; 
otherwise it is non-degenerate . 

(a) Give an example of a degenerate bilinear form (not identically zero) on 

e 2 ® e 2 . 

(b) Give an example of a non-degenerate bilinear form on <3 2 ® C 2 . 
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3. If w is a bilinear form on *IL® V, if yo is in *0, and if a function y is defined on 
*IL by y{x) = w{x, y 0 ), then y is a linear functional on C U. Is it true that if w is non- 
degenerate, then every linear functional on *11 can be obtained this way (by a suitable 
choice of yo)? 

4. Suppose that for each x and y in (P n the function w is defined by 

(a) w(x, y) = f x(t)y(t) dt, 

J o 

(b) w(x, y) = x(l) + y{ 1), 

(c) w(x, y) = x(l)*y(l), 

(d) w(x , y) = x(l) i 

In which of these cases is w a bilinear form on (P n ® <P„? In which cases is it non- 
degenerate? 

5. Does there exist a vector space and a bilinear form w on V .® V such that 
w is not identically zero but w(x, x) = 0 for every x in *U? 

6. (a) A bilinear form w on D © *0 is symmetric if w(x } y) — w(y, x) for all x and y. 
A quadratic form on is a function q on V obtained from a bilinear form w by writing 
q(x) = w(x t x). Prove that if the characteristic of the underlying scalar field is 
different from 2, then every symmetric bilinear form is uniquely determined by 
the corresponding quadratic form. What happens if the characteristic is 2? 

(b) Can a non-symmetric bilinear form define the same quadratic form as a 
symmetric one? 


§ 24. Tensor products 

In this section we shall describe a new method of putting two vector 
spaces together to make a third, namely, the formation of their tensor 
product. Although we shall have relatively little occasion to make use of 
tensor products in this book, their theory is closely allied to some of the 
subjects we shall treat, and it is useful in other related parts of mathe- 
matics, such as the theory of group representations and the tensor calculus. 
The notion is essentially more complicated than that of direct sum; we 
shall therefore begin by giving some examples of what a tensor product 
should be, and the study of these examples will guide us in laying down the 
definition. 

Let *11 be the set of all polynomials in one variable s, with, say, complex 
coefficients; let V be the set of all polynomials in another variable t; and, 
finally, let be the set of all polynomials in the two variables s and t 
With respect to the obvious definitions of the linear operations, *11, *0, and 
are all complex vector spaces; in this case we should like to call W, or 
something like it, the tensor product of *U and e 0. One reason for this 
terminology is that if we take any x in 01 and any y in *0, we may form 
their product, that is, the element z of defined by z{s, t) = x(s)y(t). 
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(This is the ordinary product of two polynomials. Here, as before, we are 
doggedly ignoring the irrelevant fact that we may even multiply together 
two elements of 01, that is, that the product of two polynomials in the same 
variable is another polynomial in that variable. Vector spaces in which a 
decent concept of multiplication is defined are called algebras , and their 
study, as such, lies outside the scope of this book.) 

In the preceding example we considered vector spaces whose elements 
are functions. We may, if we wish, consider the simple vector space <3* as 
a collection of functions also; the domain of definition of the functions is, 
in this case, a set consisting of exactly n points, say the first n (strictly) 
positive integers. In other words, a vector (£i, * • *, £ n ) may be considered 
as a function £ whose value £(i) is defined for i = 1, • • •, n\ the definition 
of the vector operations in 6 n is such that they correspond, in the new no- 
tation, to the ordinary operations performed on the functions £. If, simul- 
taneously, we consider as the collection of functions tj whose value rj(j) 
is defined for j = 1, • • m y then we should like the tensor product of <3 W 
and e OT to be the set of all functions f whose value f (i, j) is defined for 
i = 1 , • ♦ n and j = 1 , • ■ •, m. The tensor product, in other words, is 
the collection of all functions defined on a set consisting of exactly nm ob- 
jects, and therefore naturally isomorphic to C nm . This example brings out 
a property of tensor products — namely, the multiplicativity of dimension 
— that we should like to retain in the general case. 

Let us now try to abstract the most important properties of these exam- 
ples. The definition of direct sum was one possible rigorization of the crude 
intuitive idea of writing down, formally, the sum of two vectors belonging 
to different vector spaces. Similarly, our examples suggest that the tensor 
product 01 ® V of tw r o vector spaces H and V should be such that to every 
x in qi and y in V there corresponds a “product” z = x ® y in <U 0 U, in 
such a way that the correspondence between x and z } for each fixed y f as 
well as the correspondence between y and z , for each fixed x , is linear. 
(This means, of course, that (< “i#i + « 2 ^ 2 ) ® y should be equal to 
oci{x x ® y) + <*2(^2 ® y ), and that a similar equation should hold for 
x ® (otiyx + ctzyz)-) To put it more simply, x ® y should define a bilinear 
(vector-valued) function of x and y. 

The notion of formal multiplication suggests also that if u and v are 
linear functionals on qt and T) respectively, then it is their product w , de- 
fined by w(x, y) = u{x)v{y ), that should be in some sense the general ele- 
ment of the dual space (OL ® *0)'. Observe that this product is a bilinear 
(scalar-valued) function of x and y. 
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§ 25. Product bases 

After one more word of preliminary explanation we shall be ready to 
discuss the formal definition of tensor products. It turns out to be tech- 
nically preferable to get at *11 ® *0 indirectly, by defining it as the dual of 
another space; we shall make tacit use of reflexivity to obtain 'll ® V it- 
self. Since we have proved reflexivity for finite-dimensional spaces only, 
we shall restrict the definition to such spaces. 

Definition. The tensor 'product 11 ® V of two finite-dimensional vector 
spaces 11 and V (over the same field) is the dual of the vector space of 
all bilinear forms on 11 ® *0. For each pair of vectors x and y, with x in 
11 and y in *0, the tensor product z = x ® y of x and y is the element of 
11 <2> V defined by z{w) = w(x y y) for every bilinear form w. 

This definition is one of the quickest rigorous approaches to the theory, 
but it does lead to some unpleasant technical complications later. What- 
ever its disadvantages, however, we observe that it obviously has the two 
desired properties : it is clear, namely, that dimension is multiplicative (see 
§ 23, Theorem 2, and § 15, Theorem 2), and it is clear that x ® y depends 
linearly on each of its factors. 

Another possible (and deservedly popular) definition of tensor product 
is by formal products. According to that definition 11 ® *U is obtained by 
considering all symbols of the form a i( x i ® 2/*)> anc b within the set of 
such symbols, making the identifications demanded by the linearity of the 
vector operations and the bilinearity of tensor multiplication. (For the 
purist: in this definition x ® y stands merely for the ordered pair of x and 
y; the multiplication sign is just a reminder of what to expect.) Neither 
definition is simple; we adopted the one we gave because it seemed more in 
keeping with the spirit of the rest of the book. The main disadvantage of 
our definition is that it does not readily extend to the most useful generali- 
zations of finite-dimensional vector spaces, that is, to modules and to in- 
finite-dimensional spaces. 

For the present we prove only one theorem about tensor products. The 
theorem is a further justification of the product terminology, and, inciden- 
tally, it is a sharpening of the assertion that dimension is multiplicative. 

Theorem. If 9C = {x lt • • *, x n } and y = {yi, * • •, y m ] are bases in 11 
and V respectively , then the set Z of vectors Zij = Xi ® yj (i — 1, ***,«; 
j = 1, • * * , m) is a basis in 11 0 V. 

proof. Let w pq be the bilinear form on 11 ©*0 such that w PQ (xj,yj) 
= 8 ip 8 jq (i f p = 1, • • •, n; j } q = 1, * * * , m) ; the existence of such bilinear 
forms, and the fact that they constitute a basis for all bilinear forms, follow 
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from § 23, Theorem 2. Let {w' vq \ be the dual basis in U ® U, so that 
[wn, w' pq ] = If w = jLq a Pt w PQ is ^ arbitrary bilinear form 

on © *U, then 

w'a(w) = [w, w'ij] = Y,p a Pg\ w P«, w'.y] 

= an = w(xi, y s ) = Zij(w). 

The conclusion follows from the fact that the vectors w do constitute a 
basis of <11 ® V. 


EXERCISES 

1. If x = (1, 1) and 2 / = (1, 1, 1) are vectors in (R 2 and (R 3 respectively, find the 
coordinates of x ® y in (R 2 0 (R 3 with respect to the product basis {x* 0 y,*}, 
where x» — ($n, 6 , 2 ) and yj = (5iy, 62 ;, Ssi). 

2. Let (P n , TO be the space of all polynomials 2 with complex coefficients, in two 
variables s and t y such that either 2 = 0 or else the degree of 2 ( 5 , t) is ^ m — 1 
for each fixed s and ^ n — 1 for each fixed t. Prove that there exists an iso- 
morphism between (P n <g> (P m and (P«, w such that the element 2 of (P n ,m that cor- 
responds to x ® y (x in (P n , y in (P m ) is given by z(s, t) = x{s)y{t). 

3 . To what extent is the formation of tensor products commutative and associa- 
tive? What about the distributive law <U 0 (*0® *W) = (fit 0 D) ® (RL 0 *W)? 

4. If *0 is a finite-dimensional vector space, and if x and y are in * 0 , is it true 
that x 0 y ~ y 0 x? 

5. (a) Suppose that *0 is a finite-dimensional real vector space, and let <IL be 
the set 6 of all complex numbers regarded as a (two-dimensional) real vector 
space. Form the tensor product V + = <U 0 *0* Prove that there is a way of 
defining products of complex numbers with elements of *U + so that a(x <g> y) 
= otx ® y whenever a and x are in 6 and y is in V. 

(b) Prove that with respect to vector addition, and with respect to complex 
scalar multiplication as defined in (a), the space V + is a complex vector space. 

(c) Find the dimension of the complex vector space T) + in terms of the di- 
mension of the real vector space D. 

(d) Prove that the vector space V is isomorphic to a subspace in *U + (when the 
latter is regarded as a real vector space). 

The moral of this exercise is that not only can every complex vector space be 
regarded as a real vector space, but, in a certain sense, the converse is true. The 
vector space D"* - is called the complexification of V. 

6 . If *11 and V are finite-dimensional vector spaces, what is the dual space of 
*U' 0 V'? 


§ 26 . Permutations 

The main subject of this book is usually known as linear algebra. In the 
last three sections, however, the emphasis was on something called multi- 
linear algebra. It is hard to say exactly where the dividing line is between 
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the two subjects. Since, in any case, both are quite extensive, it would not 
be practical to try to stuff a detailed treatment of both into the same vol- 
ume. Nor is it desirable to discuss linear algebra in its absolutely pure 
state; the addition of even a small part of the multilinear theory (such as 
is involved in the modem view of tensor products and determinants) ex- 
tends the domain of applicability of the linear theory pleasantly out of 
proportion with the effort involved. We propose, accordingly, to continue 
the study of multilinear algebra; our intention is to draw a more or less 
straight line between what we already know and the basic facts about de- 
terminants. With that in mind, we shall devote three sections to the dis- 
cussion of some simple facts about combinatorics; the connection between 
those facts and multilinear algebra will appear immediately after that 
discussion. 

By a permutation of the integers between 1 and k (inclusive) we shall 
mean a one-to-one transformation that assigns to each such integer another 
one (or possibly the same one). To say that the transformation ir is one- 
to-one means, of course, that if ir(l), •••, rr(k) are the integers that ir 
Aligns to 1, •••,&, respectively, then ir(i) = ir(J) can happen only in case 
i = j. Since this implies that both the sets {1, ■ • ■ , k] and { ir(l), • • ■ , ir(fc) } 
consist of exactly k elements, it follows that they consist of exactly the 
same elements. From this, in turn, we infer that a permutation ir of the 
set {1, • • •, k\ maps that set onto itself, that is, that if 1 ^ j ^ k, then 
there exists at least one i (and, in fact, exactly one) such that ir(i) = j. 
The total number of the integers under consideration, namely, k, will be 
held fixed throughout the following discussion. 

The theory of permutations, like everything else, is best understood by 
staring hard at some non-trivial examples. Before presenting any exam- 
ples, however, we shall first mention some of the general things that can be 
done with permutations; by this means the examples will illustrate not only 
the basic concept but also its basic properties. 

If <r and r are arbitrary permutations, a permutation (to be denoted by 
<tt) can be defined by writing 

(<rr)(i) = a(ri) 

for each t. To prove that <rr is indeed a permutation, observe that if 
(or) (i) = ((rr)O'), then r(i) = t(j) (since <r is one-to-one), and therefore 
i = j (since r is one-to-one). The permutation <rr is called the product of 
the permutations a and t. Warning: the order is important. In general 
or t * t < t , or, in other words, permutation multiplication is not commutative. 

Multiplication of permutations is associative; that is, if ir, <r, and t are 
permutations, then 

(1) (t<t)t ■= t(ot). 
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To prove this, we must show that 

((*v)t)(i) = (t(<tt))(i) 

for every i. The proof consists of several applications of the definition of 
product, as follows: 

(Onr)r)(i) - (it < r)(rt) = x(<r(r(t))), 

and 

Wot))® = *((*r)®) - *(<r(r®)). 

In view of this result we may and shall omit parentheses in writing the 
product of three or more permutations. The result also enables us to prove 
the obvious laws of exponents. The powers of a permutation ir are defined 
inductively by writing tt 1 = t and x p+1 = ir-ir p for all p = 1, 2, 3, • • • ; 
the associative law implies that t p it 9 = tt p+9 and = t^ 9 for all p and 
q. Observe that any two powers of a permutation commute with each 
other, that is, that tt p tt 9 = tt 9 t p . 

The simplest permutation is the identity (to be denoted by c) ; it is defined 
by t(i) = i for each i. If ir is an arbitrary permutation, then 

(2) €T — 7T€ = IT, 

or, in other words, multiplication by e leaves every permutation unaffected. 
The proof is straightforward; for every i we have 

(«r)(i) - «W0) - r® 

and 

(«)(0 = *(«(»•)) = *(*)• 

The permutation e behaves, from the point of view of multiplication, like 
the number I. In analogy with the usual numerical convention, the zero-th 
power of every permutation t is defined by writing ir° = e. 

If tt is an arbitrary permutation, then there exists a permutation (to be 
denoted by tt"' 1 ) such that 

0) 7T _ 1 7T = 7T7T ^ = C. 

To define 7r —1 (j), where, of course, 1 ^ j g fc, find the unique i such that 
**(•) — and write 7r -1 (j) = i; the validity of (3) is an immediate conse- 
quence of the definitions. The permutation ir -1 is called the inverse of tt. 

Let S* be the set of all permutations of the integers between 1 and k . 
What we have proved so far is that an operation of multiplication can be 
defined for the elements of S k so that (1) multiplication is associative, (2) 
there exists an identity element, that is, an element such that multiplica- 
tion by it leaves every element of fixed, and (3) every element has an 
inverse, that is, an element whose product with the given one is the iden- 
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tity. A set satisfying (l)-(3) is called a group with respect to the concept 
of product that those conditions refer to; the set $*, in particular, is called 
the symmetric group of degree k. Observe that the integers 1, • • • , k could 
be replaced by any k distinct objects without affecting any of the concepts 
defined above; the change would be merely a notational matter. 


§ 27. Cycles 

A simple example of a permutation is obtained as follows: choose any 
two distinct integers between 1 and k , say, p and q, and write 

t(p) = q, 

r(q ) = V, 

r(i) = i whenever i p and i ^ q. 

The permutation r so defined is denoted by (p, q) ; every permutation of 
this form is called a transposition. If r is a transposition, then r 2 — e. 

Another useful way of constructing examples is to choose p distinct inte- 
gers between 1 and k, say, i\ y • • *, i p , and to write 

a(ij) = ij+i whenever 1 ^ j < p, 

o' (ip) = ii, 

o{i) = i whenever i ^ i u • • •, i 9 * i p . 

The permutation a so defined is denoted by (ii, • • •, i p ). If p = 1, then 
o = e; if p = 2, then it is a transposition. For any p with 1 < p g, k, 
every permutation of the form (ii, • • •, i p ) is called a p-cycle, or simply a 
cycle) the 2-cycles are exactly the transpositions. Warning: it is not as- 
sumed that i\ < • • • < i p . If, for instance, fc = 5 and p — 3, then there 
are twenty distinct cycles. Observe also that the notation for cycles is not 
unique; the symbols (1, 2, 3), (2, 3, 1), and (3, 1, 2) all denote the same 
permutation. Two cycles (t’i, • • i p ) and (ji, • * *, j q ) are disjoint if none 
of the t’s is equal to any of the j’s. If o and r are disjoint cycles, then or 
— rOy or, in other words o and r commute. 

Theorem 1. Every permutation is the product of pairwise disjoint cycles. 

proof. If 7r is a permutation and if i is such that 7 r(t) 5^ i (assume, for 
the moment, that tt 9 * e), form the sequence (f, ir(i) f 1 r 2 (i), • • •)• Since 
there are only a finite number of distinct integers between 1 and k f there 
must exist exponents p and q (0 ^ p < q) such that 7 r p (i) = ir g (i). The 
one-to-one character of 7 r implies that Tr q ~ v (i) ~ i, or, with an obvious 
change of notation, what we have proved is that there must exist a strictly 
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positive exponent p such that ir p (i) = i. If p is selected to be the smallest 
exponent with this property, then the integers i, • • *, T p ~ l (i) are distinct 
from each other. (Indeed, if0^g<r<p and 7 r a (i) = 7r r (i), then ir r ~ q (i) 
= i, contradicting the minimality of p.) It follows that (i, *, 7r p " 1 (i)) 

is a p-cycle. If there is a j between 1 and k different from each of i, • • * , 
^ p ^(i) and different from 7 r(j), we repeat the procedure that led us to this 
cycle, with j in place of i . We continue forming cycles in this manner as 
long as after each step we can still find a new integer that it does not send 
on itself; the product of the disjoint cycles so constructed is r. The case 
7r — € is covered by the rather natural agreement that a product with no 
factors, an “empty product/' is to be interpreted as the identity permuta- 
tion. 

Theorem 2. Every cycle is a product of transpositions . 

proof. Suppose that a is a p-cycle; for the sake of notational simplicity, 
we shall give the proof, which is perfectly general, in the special case p = 5. 
The proof itself consists of one line : 

(iu i‘ 2 , H, U, is) = (<1, H)Hu *4) (»ii *a) (tii ta). 

A few added words of explanation might be helpful. In view of the defini- 
tion of the product of permutations, the right side of the last equation 
operates on each integer between 1 and k from the inside out, or, perhaps 
more suggestively, from right to left. Thus, for example, the result of 
applying (i u u)(i\ } is)(i\ } h) to i 3 is calculated as follows: (i l} i 2 ) fe) 

= <ai (iu is) (is) = h, (iu H)(ii) = U, (iu *«)(» 4) - h, so that 
(*i> is)(iit u)(iu is)(iu *2) (*3) = U- 

For the sake of reference we put on record the following immediate corol- 
lary of the two preceding theorems. 

Theorem 3. Every permutation is a product of transpositions . 

Observe that the transpositions in Theorems 2 and 3 were not asserted 
to be disjoint; in general they are not. 


EXERCISES 

1. (a) How many permutations are there in S*? 

(b) How many distinct p-cycles are there in S* (1 ^ p ^ fc)? 

2. If a and r are permutations (in Sjt), then (or) -1 — t”V“\ 

3. (a) If a and r are permutations (in $*), then there exists a unique permutation 
such that air — r. 

(b) If t, a, and r are permutations such that xa =* xr, then <r =* r. 
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4. Give an example of a permutation that is not the product of disjoint trans- 
positions. 

5. Prove that every permutation in S* is the product of transpositions of the 
form (j, j + 1), where 1 ^ j < k. Is this factorization unique? 

6. Is the inverse of a cycle also a cycle? 

7. Prove that the representation of a permutation as the product of disjoint 
cycles is unique except possibly for the order of the factors. 

8. The order of a permutation t is the least integer p (> 0) such that t p = e. 

(a) Every permutation has an order. 

(b) What is the order of a p-cycle? 

(c) If a is a p-cycle, r is a g-cycle, and <r and r are disjoint, what is the order of 

<TT? . . . 

(d) Give an example to show that the assumption of disjomtness is essential m 
(c). 

(e) If it is a permutation of order p and if r q = e, then q is divisible by p. 

9. Every permutation in S* ( k > 1) can be written as a product, each factor of 

which is one of the transpositions (1, 2), (1, 3), (1, 4), (1, k). 

10. Two permutations a and r are called conjugate if there exists a permutation 
7 r such that cr 7 r = ttt. Prove that <r and r are conjugate if and only if they have 
the same cycle structure. (This means that in the representation of o' as a product 
of disjoint cycles, the number of p-cycles is, for each p, the same as the correspond- 
ing number for r.) 


§28. Parity 

Since (1, 3)(1, 2) = (1, 2)(2, 3)(= (1, 2, 3)), we see that the representa- 
tion of a permutation (even a cycle) as a product of transpositions is not 
necessarily unique. Since (1, 3)(1, 4)(1, 2)(3, 4)(3, 2) = (1, 4) (1, 3)(1, 2) 
(= (1, 2, 3, 4)), we see that even the number of transpositions needed to 
factor a cycle is not necessarily unique. There is, nevertheless, something 
unique about the factorization, namely, whether the number of transposi- 
tions needed is even or odd. We proceed to state this result precisely, and 
to prove it. 

Assume, for simplicity of notation, that k = 4. Let /be the polynomial 
(in four variables it , t 2 , t 3 , Z 4 ) defined by 

f(h, k, h, 1 4 ) = (h ~ < 2 ) Oi - k)(h ~ U)(k ~ k)(h ~ U)(k ~ k)- 

(In the general case / is the product of all the differences U — U with 
1 g * < j £ k.) Each permutation r in S 4 converts / into a new polyno- 
mial, denoted by wf ; by definition 

M)(tl t hi hi U) ** /(f*(l)> k(2)> $t<3 )i ^r(4))* 
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In words: to obtain 7 r/, replace each variable in /by the one whose subscript 
is obtained by allowing tt to act on the subscript of the given one. If, for 
instance, r - ( 2 , 4 ), then 

(rf)(ti, hy hy £4) = (Ji “ ^4) (^1 “ h)(h — h)(h — t 3 )(t 4 — h)(h “ *2)- 

If a = ( 1 , 3 , 2 , 4 ), so that or - ( 1 , 3 , 2 ), then both (<r(r/))(«i, t 2 , t 3) t 4 ) and 
hy h> U) are equal to 

(h — h)(h — h)(h “ ^)(^i “* — t 4 )(t 2 “ h). 

These computations illustrate, and indicate the proofs of, three impor- 
tant facts. (1) For every 'permutation it, the factors of tt f are the same as the 
factors of /, except possibly for sign and order ; consequently irf = f or else 
vf = —/• The permutation tt is called even if irf — f and odd if irf = 

The signum (or sign) of a permutation tt, denoted by sgn tt, is +1 or —1 
according as tt is even or odd, so that we always have irf — (sgn tt)/. The 
fact that 7 r is even, or odd, is sometimes expressed by saying that the parity 
of tt is even, or odd, respectively. (2) If r is a transposition , then sgn r — 
— 1 , or, equivalently , every transposition is odd . The proof is the obvious 
generalization of the following reasoning about the special example (2, 4). 
Exactly one factor of / contains both t 2 and t 4 , and that one changes sign 
in the passage from / to 7 rf. If a factor contains neither t 2 nor t 4j it stays 
fixed. The factors containing only one of t 2 and t 4 come in pairs (such as 
the pair (t 2 — t 3 ) and (f 3 — t 4 ) } or the pair (*i — t 2 ) and (ti — t 4 )). Each 
factor in such a pair goes into the other factor, except possibly that its 
sign may change; if it changes for one factor, it will change for its mate. 
( 3 ) If <7 and t are permutations , then (or)/ = <r(rf) ; consequently or is even 
if and only if a and r have the same parity. Observe that sgn (<rr) = 
(sgn o-)(sgn r). 

It follows from (2) and ( 3 ) that a product of a bunch of transpositions 
is even if and only if there are an even number of them, and it is odd other- 
wise. (Note, in particular, by looking at the proof of § 27 , Theorem 2, 
that a p-cycle is even if and only if p is odd; in other words, if a is a p-cycle, 
then sgn a = ( — l) p+1 .) Conclusion: no matter how a permutation 7 r is 
factored into transpositions, the number of factors is always even (this is 
the case if tt is even), or else it is always odd (this is the case if t is odd). 

The product of two even permutations is even; the inverse of an even 
permutation is even; the identity permutation is even. These facts are 
summed up by saying that the set of all even permutations is a subgroup 
of St; this subgroup (to be denoted by Ot*) is called the alternating group of 
degree fc. 
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EXERCISES 

1. How many permutations are there in (X*? 

2. Give examples of even permutations with even order and even permutations 
with odd order; do the same for odd permutations. 

3. Every permutation in (X* (A > 2) can be written as a product, each factor of 
which is one of the 3-cycles (1, 2, 3), (1, 2, 4), * • *, (1, 2, k). 


§ 29. Multilinear forms 

We are now ready to proceed with multilinear algebra. The basic con- 
cept is that of multilinear form (or functional), an easy generalization of 
the concept of bilinear form. Suppose that Vi, • • •, D* are vector spaces 
(over the same field); a k-linear form (k — 1, 2, 3, • • •) is a scalar-valued 
function on the direct sum V x ® • • • © D* with the property that for each 
fixed value of any k — 1 arguments it depends linearly on the remaining 
argument. The 1-linear forms are simply the linear functionals (on *0i), and 
the 2-linear forms are the bilinear forms (on e Ui ©* 02 )- The 3-linear (or 
trilinear) forms are the scalar-valued functions w (on *0i © * 1)3 ® *^ 3 ) such 
that 

w(a iXx + a 2 x 2 , y , z) = aiw(xx, y } z) + a 2 w(x 2) y, 2 ), 

and such that similar identities hold for w(x, ctiy\ + a 2 y 2 , z) and w{x, y f 
ai Z\ + a 2 z 2 ). A function that is fc-linear for some k is called a multilinear 
form. 

Much of the theory of bilinear forms extends easily to the multilinear 
case. Thus, for instance, if and w 2 are A>linear forms, if a\ and a 2 are 
scalars, and if to is defined by 

w(Xi, • * *, Xjc) = aiWl(Zl, * • *, Xk) + a 2 w 2 (xi f • • •, Xjf) 

whenever Xi is in V i, i = 1, * • *, k, then w is a fc-linear form, denoted by 
a\Wi + a 2 w 2 . The set of all ^-linear forms is a vector space with respect 
to this definition of the linear operations; the dimension of that vector 
space is the product n\ • • * n*, where, of course, n* is the dimension of *0;. 
The proofs of all these statements are just like the proofs (in § 23) of the 
corresponding statements for the bilinear case. We could go on imitating 
the bilinear theory and, in particular, studjdng multiple tensor products. 
In order to hold our multilinear digression to a minimum, we shall proceed 
instead in a different, more special, and, for our purposes, more useful 
direction. 

In what follows we shall restrict our attention to the case in which the 
k spaces Vi are all equal to one and the same vector space, say, *0; we shall 
assume that *0 is finite-dimensional. In this case we shall call a “/.'-linear 
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form on V\ ® * • ■ ©D*” simply a “fc-linear form on V” or, even more 
simply, a “A-linear form”; the language is slightly inaccurate but, in con- 
text, completely unambiguous. If the dimension of V is n, then the dimen- 
sion of the vector space of all ^-linear forms is n k . The space V and, of 
course, the dim ension n will be held fixed throughout the following discus- 
sion. 

The special character of the case we are studying enables us to apply a 
technique that is not universally available; the technique is to operate on 
fc-linear forms by permutations in S*. If w is a fc-linear form, and if tt is in 
S*, we write 

rw(xi , • ■ Xfc) = w(x t{ d, •••, x T( fc>) 

whenever x u • • • , x* are in V. The function tw so defined is again a fe- linear 
form. (The value of ttw at (xi, •••, x*) is more honestly denoted by 
(irw)(xj, • • *, Xk); since, however, the simpler notation does not appear to 
lead to any confusion, we shall continue to use it.) 

Using the way permutations act on Zc-linear forms, we can define some 
interesting sets of such forms. Thus, for instance, a /c-linear form w is 
called symmetric if mv = w for every permutation tt in %. (Note that if 
fc = 1, then this condition is trivially satisfied.) The set of all symmetric 
ft-linear forms is a subspace of the space of all ft-linear forms. Hence, in 
particular, the origin of that space, the Minear form 0, is symmetric. For 
a non-trivial example, suppose that k = 2, let y\ and y 2 be linear func- 
tionals on *U, and write 

w(x h x 2 ) = yi(x 1 )y 2 (x 2 ) + yi{x 2 )y 2 (x x ). 

This procedure for constructing /c-linear forms has useful generalizations. 
Thus, for instance, if 1 ^ h < k ^ n, and if u is an A-linear form and v is 
a (k — A)-linear form, then the equation 


w(xi, • • *, x k ) = u(x i, • • *, x h )-v(x h+Xy • • •, x fc ) 

defines a fc-linear form w 1 which, in general, is not symmetric. A symmetric 
/c-linear form can be obtained from w (or, for that matter, from any given 
^-linear form) by forming where the summation is extended over 

all permutations ir in S*. 

We shall not study symmetric ^-linear forms any more. We introduced 
them here because they constitute a very natural class of functions de- 
finable in terms of permutations. We abandon them now in favor of 
another class of functions, which play a much greater role in the theory. 
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§ 30. Alternating forms 

A fc-linear form w is skew-symmetric if mo = —w for every odd permuta- 
tion 7 r in S*. Equivalently, w is skew-symmetric if mv = (sgn r)w for every 
permutation tt in S*. (If mo — (sgn ir)w for all tt, then, in particular, mv 
= — w whenever tt is odd. If, conversely, mv = —10 for all odd tt, then, 
given an arbitrary 7r, factor it into transpositions, say, tt = t\ • • • r q , ob- 
serve that sgn tt — ( — l) 9 , and, since mv = ( — 1 ) q w, conclude that mv — 
(sgn 7 r)w, as asserted. This proof makes tacit use of the unproved but 
easily available fact that if a and r are permutations in S*, then <t(tw) = 
(<jt)w.) The set of all skew-symmetric k - linear forms is a subspace of the 
space of all ^-linear forms. To get a non-trivial example of a skew-symmet- 
ric bilinear form w , let yi and y 2 be linear functionals and write 

w(x 1 , x 2 ) = yi(xi)y 2 (x 2 ) - yi(x 2 )y 2 (xi). 

More generally, if w is an arbitrary ^-linear form, a skew-symmetric fc-linear 
form can be obtained from w by forming ^ (sgn 7r)mv , where the summa- 
tion is extended over all permutations w in S*- 

A ^-linear form w is called alternating if w * • *, xu) =0 whenever two 
of the x’s are equal. (Note that if k = 1, then this condition is vacuously 
satisfied.) The set of all alternating ^-linear forms is a subspace of the 
space of all A;-linear forms. There is an important relation between alter- 
nating and skew-symmetric forms. 

Theorem 1. Every alternating multilinear form is skew-symmetric , 

proof. Suppose that w is an alternating fc-Iinear form, and that i and j 
are integers, 1 ^ i < j ^ k. If x\, • • •, x* are vectors, we write 

Wo (Xi, Xj) = w(x !, • • •, x k ); 

if the x’s other than x, and Xj are held fixed (temporarily), then Wo is an 
alternating bilinear form of its two arguments. Since, by bilinearity, 

W 0 (Xi + Xj, Xi + Xj) = W 0 (x iy Xi) + W 0 (x iy Xj) + W 0 (Xj, Xi) + W 0 (Xj } Xj), 

and since, by the alternating character of Wq, the left side and the two ex- 
treme terms of the right side of this equation all vanish, we see that w 0 (xj , xf) 
— —w 0 (xi, Xj), This, however, says that 

(h * * *> *k) = ~w(x u * • *, x k ), 

or, since the x’s are arbitrary, that (i, j)w — —w. Since every odd permu- 
tation t r is the product of an odd number of transpositions, such as (i, j), 
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it follows that mv = —w for every odd x, and the proof of the theorem is 
complete. 

The connection between alternating forms and skew-symmetric ones in- 
volves one subtle point. Consider the following “proof” of the converse of 
Theorem 1: if w is a skew-symmetric ^-linear form, if 1 ^ % < j ^ k, and 
if xi, • * •, are vectors such that x» = xy, then (i, j)w(x i, • • •, X*) = 
to(xi, • • • , xjk) since x t * = Xy, and at the same time, (i, j)w(xx, • • •, Xk) = 
— w(xi, • • - , Xk) since w is skew-symmetric; consequently w(zi, * * *> £*) 
= — to(xi, • • •, Xk), so that w is alternating. This argument is wrong; the 
trouble is in the inference “if w = — w, then w = 0.” If we examine that 
inference in more detail, we find that it is based on the following reasoning: 
if w = — Wj then w + w = 0, so that (1 + 1 )w — 0. This is correct. The 
trouble is that in certain fields 1 + 1 = 0, and therefore the inference from 
(1 + 1 )w = 0 to w = 0 is not justified; the converse of Theorem 1 is, in 
fact, false for vector spaces over such fields. 

Theorem 2. If x i, • • •, xt are linearly dependent vectors and if w is an 

alternating k-linear form , then w(x i, • • •, Xk) =0. 

proof. If Xi = 0 for some i, the conclusion is trivial. If all the x»* are 
different from 0, we apply the theorem of § 6 to find an x*, 2 ^ h S k, 
that is a linear combination of the preceding ones. If, say, x* = Xa-o 1 <*»#», 
replace X* in w(x i, •••, Xk) by this expansion, use the linearity of 
w(x i, • • •, Xk) in its h- th argument, and draw the desired conclusion by an 
( h — l)-fold application of the assumption that to is alternating. 

In one extreme case (namely, when k «■ n) a sort of converse of Theorem 
2 is true. 

Theorem 3. If w is a non-zero alternating n-linear form, and if x\, • • * , x„ 

are linearly independent vectors , then to(xi, • •*, x n ) ^ 0. 

proof. Since (§ 8, Theorem 2) the vectors Xi, • • *, x n form a basis, we 
may, given an arbitrary set of n vectors y\ f write each y as a linear 

combination of the x's. If we replace each y in w(y 1} • • •, y n ) by the cor- 
responding linear combination of x’s and expand the result by multilinear- 
ity, we obtain a long linear combination of terms such as w{z x , • • *, z n ), 
where each z is one of the x’s. If, in such a term, two of the z’a coincide, 
then, since w is alternating, that term must vanish. If, on the other hand, 
all the z’a are distinct, then w(z u • • ♦, z n ) = mv(x u • • •, x n ) for some per- 
mutation x. Since (Theorem 1) w is skew-symmetric, it follows that 
w (*i, • * •, 2») = (sgn *)w(zi, • • •, x n ). If w(xi , • •, x n ) were 0, it would 

follow that w(zi t • * •, Zn) *= 0, and hence that w(y lf - • • , y n ) = 0 for all 
j/i, • • • , y nt contradicting the assumption that w r* 0. 
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The proof (not the statement) of this result yields a valuable corollary. 

Theorem 4. Any two alternating n-linear forms are linearly dependent. 

proof. Suppose that w x and w 2 are alternating n-linear forms and that 
{xu * • • > x n ) is a basis. Given any n vectors y Xf •••,?/„, write each of them 
as a linear combination of the x’s, and, just as above, replace each of them, 
in both w x (y lf • • •, y n ) and w 2 (y x , • • • , 2/»), by the corresponding linear com- 
bination, It follows that each oi w x (y Xf • • •, y n ) and w 2 (yi, • * *, y n ) is a 
linear combination (the same linear combination) of terms such as w x (z lt 
• • -,z n ) and w 2 (z Xj • • *, z n ) } where each z is one of the x’s . Since w x (x Xf • • •, 
z n ) and w 2 (x Xy • • *, x n ) are scalars, they are linearly dependent, so that 
there exist scalars a x and a 2 not both zero, such that a x w x (x X} • • •, x n ) 
+ ct 2 w 2 (x Xy • • *, x n ) = 0; from these facts we may infer that a x w x + a 2 w 2 
= 0, as asserted. 


§ 31. Alternating forms of maximal degree 

Glancing back at the last section, the reader will observe that we did not 
give any non-trivial examples of alternating fc-linear forms, and we did not 
even indirectly hint at any existence theorem concerning them. In fact 
they do not always exist; § 30, Theorem 2 implies, for instance, that if 
fc > n, then 0 is the only alternating ^-linear form. (See § 8, Theorem 2.) 
For the applications we have in mind, we need only one existence theorem; 
we proceed to prove a rather sharp form of it. 

Theorem. If n > 0, the vector space of alternating n-linear forms on 

an n-dimensional vector space is one-dimensional 

proof. We show first that if 1 g k ^ n, then there exists at least one 
non-zero alternating fc-linear form; the proof goes by induction on k. If 
7c = 1, the desired result follows from the existence of non-trivial linear 
functionals (see § 15, Theorem 3). If 1 ^ k < n, we assume that v is a non- 
zero alternating fc-linear form; using v we shall construct a non-zero alter- 
nating (k + l)-linear form w. Since v ^ 0, we can find vectors x X} • • *, x% 
such that v(x x , • • *, re*) 5^ 0 (the superscripts are just indices here). Since 
k < n, we can find a vector x*+i that does not belong to the subspace 
spanned by x Xf • • •, xj, and (see § 17, Theorem 1) then we can find a linear 
functional u such that u(x x ) — • • • — w(x*) = 0 and u(x2+i) 9 ^ 0. 

The promised ( k + l)-linear form w is obtained from the linear func- 
tional u and the fc-linear form v by writing 

( 1 ) w(x Xj • • •, Xi, X4+1) = ]C *-1 (b k + l ) v ( x u * * * i X k )u(x k + 1) 

— v{x Xi • * '» **)u(x* +1 ). 
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Thus, for instance, if k = 3, then 

U)(xi, x 2} X S , x 4 ) = v(x 4 , x 2 , X 3 )u(x i) + v{x u x 4y x 3 )u(x 2 ) 

+ v(x u x 2} x 4 )u(x 3 ) — v(x 1} x 2) x z )u(x 4 ). 

It follows from the elementary discussion in § 29 that w is indeed a (k + 1)- 
linear form; we are to prove that it is non-zero and alternating. 

The fact that w is not identically zero is easy to prove. Indeed, since 
u(xi) = 0 for i = 1, ••*,&, it follows that if we replace each Xi by x \ ? in 

(1) , i» 1, • • A? H- 1, then the first k terms of the sum on the right all 
vanish, and, consequently, 

(2) tr(x?, • • •, a;*, x2 + i) = — v(x?, • • •, x*)u(x* +1 ) 9 * 0. 

Suppose now that x%, * • • , x*, x*_|-i are vectors and i and j are integers 
such that l^i<j^k-\-l and = xj. We are to prove that, under 
these circumstances, w(x u •, x k) x*+i) = 0. We note that both Xi and 
Xj occur in the argument of v in all but two of the k + 1 terms on the right 
side of (1). Since v is alternating, the terms in which both and xj do so 
occur all vanish. 

The remainder of the proof splits naturally into two cases. If j = k + 1, 
then all that is left is 

(i, k + l)»(*i, •••, x k )u{x k+l ) - v(x u ■■■, x k )u(x k+i ), 

and, since a:,- = x k+l , this is dearly equal to 0. If j g k, then each of the 
two possibly non-vanishing terms that are still left can be obtained from 
the other by an application of the transposition (i, j). It follows that those 
terms differ in sign only, and hence that their sum is zero. This proves 
that w is alternating and proves, therefore, that the dimension of the space 
of alternating n-linear forms is not less than 1. 

The fact that the dimension of the space of alternating 7 i-linear forms 
is not more than 1 is an immediate consequence of § 30, Theorem 4. 

This concludes our discussion of multilinear algebra. The reader might 
well charge that the discussion was not very strongly motivated. The 
complete motivation cannot be contained in this book; the justification for 
studying multilinear algebra is the wide applicability of the subject. The 
only application that we shall make is to the theory of determinants (which, 
to be sure, could be treated by more direct but less elegant methods, involv- 
ing much greater dependence on arbitrary choices of bases) ; that applica- 
tion belongs to the next chapter. 



54 


SPACES 


Sec. 31 


EXERCISES 

1. Interpret the following matrices as linear transformations on C 2 or ©’ and, in 
each case find a basis such that the matrix of the transformation with respect to 
that basis is triangular. 

2. Give an example of a skew-symmetric multilinear form that is not alternating. 
(Recall that in view of the discussion in § 30 the field of scalars must have charac- 
teristic 2.) 

3. Give an example of a non-zero alternating ^-linear form w on an n-dimensional 
space ( k < n ), such that w(x h • * *, x*) = 0 for some set of linearly independent 
vectors xi, • • • , r*. 

4. What is the dimension of the space of all symmetric fc-linear forms? What 
about the skew-symmetric ones? What about the alternating ones? 
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§32. Linear transformations 

We come now to the objects that really make vector spaces interesting. 

Definition. A linear transformation (or operator) A on a vector space 

f 0 is a correspondence that assigns to every vector x in a vector Ax 

in *0, in such a way that 

A (ax + fry) = a Ax + ($Ay 

identically in the vectors x and y and the scalars a and 

We make again the remark that we made in connection with the defini- 
tion of linear functionals, namely, that for a linear transformation A, as we 
defined it, AO = 0. For this reason such transformations are sometimes 
called homogeneous linear transformations. 

Before discussing any properties of linear transformations we give sev- 
eral examples. We shall not bother to prove that the transformations we 
mention are indeed linear; in all cases the verification of the equation that 
defines linearity is a simple exercise. 

(1) Two special transformations of considerable importance for the study 
that follows, and for which we shall consistently reserve the symbols 0 and 
1 respectively, are defined (for all x) by Ox * 0 and lx = x. 

(2) Let x 0 be any fixed vector in D, and let be any linear functional 
on V; write Ax = y$(x) • x 0 . More generally: let { X \ , • • •, x n ] be an arbi- 
trary finite set of vectors in V and let [y lt • * •, y n ] be a corresponding set 

of linear functionals on V; write Ax = yi(x)x x -1 (- y n (x)x n . It is not 

difficult to prove that if, in particular, V is n-dimensional, and the vectors 
x u • • *, x n form a basis for *0, then every linear transformation A has the 
form just described. 
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(3) Let ir be a permutation of the integers {1, • • - , n} ; if x — (fi, 

is a vector in <3 n , write Ax = ({ t (d, • • •, |»( n )). Similarly, let it be a poly- 
nomial with complex coefficients; if x is a vector (polynomial) in (P, write 
Ax = y for the polynomial defined by y(t) = x(ir(t)). 

(4) For any x in (P„, x(t) = Xy-o write (Dx)(t) = if/ -1 - 

(We use the letter Z> here as a reminder that Dx is the derivative of the 
polynomial x. We remark that we might have defined D on (P as well as 
on (P n ; we shall make use of this fact later. Observe that for polynomials 
the definition of differentiation can be given purely algebraically, and does 
not need the usual theory of limiting processes.) 


(5) For every x in (P, x(t) = £y-o write Sx = Zli-o ^ +1 - 
(Once more we are disguising by algebraic notation a well-known analytic 


concept 

( x{s) ds.) 

Jo 


dx 

Just as in (4) {Dx) {t) stood for — , so here (Sx) (i) is the same as 

at 


(6) Let m be a polynomial with complex coefficients in a variable t 
(We may, although it is not particularly profitable to do so, consider m 
as an element of (P.) For every x in (P, we write Mx for the polynomial 
defined by {Mx)(t) = m(t)x(t). For later purposes we introduce a special 
symbol; in case m(t) — t f we shall write T for the transformation M } 
so that (Tx)(t) = tx(t). 


§ 33. Transformations as vectors 

We proceed now to derive certain elementary properties of, and relations 
among, linear transformations on a vector space. More particularly, we 
shall indicate several ways of making new transformations out of old ones; 
we shall generally be satisfied with giving the definition of the new trans- 
formations and we shall omit the proof of linearity. 

If A and B are linear transformations, we define their sum , S = A + B f 
by the equation Sx = Ax + Bx (for every x). We observe that the 
commutativity and associativity of addition in *0 imply immediately that 
the addition of linear transformations is commutative and associative. 
Much more than this is true. If we consider the sum of any linear trans- 
formation A and the linear transformation 0 (defined in the preceding sec- 
tion), we see that A + 0 = A. If, for each A, we denote by —A the trans- 
formation defined by (—A)# = —(Ax), we see that A + (—A) = 0, and 
that the transformation —A, so defined, is the only linear transformation 
B with the property that A + B = 0. To sum up: the properties of a 
vector space, described in the axioms (A) of § 2, appear again in the set of 
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all linear transformations on the space; the set of all linear transformations 
is an abelian group with respect to the operation of addition. 

We continue in the same spirit. By now it will not surprise anybody if 
the axioms (B) and (C) of vector spaces are also satisfied by the set of all 
linear transformations. They are. For any A , and any scalar a, we define 
the product aA by the equation (aA)x — a(Ax ). Axioms (B) and (C) are 
immediately verified; we sum up as follows. 

Theorem. The set of all linear transformations on a vector space is itself 

a vector space . 

We shall usually ignore this theorem ; the reason is that we can say much 
more about linear transformations, and the mere fact that they form a 
vector space is used only very rarely. The “much more” that we can say 
is that there exists for linear transformations a more or less decent definition 
of multiplication, which we discuss in the next section. 


EXERCISES 

1. Prove that each of the correspondences described below is a linear trans- 
formation. 

(a) T) is the set <5 of complex numbers regarded as a real vector space; Ax is the 
complex conjugate of x . 

(b) V is <P; if x is a polynomial, then (Ax)(t) = x(t + 1) — x(t). 

(c) *U is the Axfold tensor product of a vector space with itself; A is such that 

A(x i <g> • ■ • <g> z fc ) = x r( i) ® ® x T (tc), where tt is a permutation of { 1, • * •, k). 

(d) *0 is the set of all ft-linear forms on a vector space; (Aw)(zi, •••,£*) = w(x T (i), 
• ■ x r (i b)), where t is a permutation of { 1, • • *, k\. 

(e) *0 is the set of all /c-linear forms on a vector space; if w is in 1), then Aw ~ 

where the summation is extended over all permutations tt in S*. 

(f) Same as (e) except that Aw = J2 (sgn ir ) ttw. 

2. Prove that if V is a finite-dimensional vector space, then the space of all 
linear transformations on *U is finite-dimensional, and find its dimension. 

3. The concept of a u linear transformation,” as defined in the text, is too special 
for some purposes. According to a more general definition, a linear transformation 
from a vector space 'll to a vector space V over the same field is a correspondence A 
that assigns to every vectoi* x in 'll a vector Ax in V so that 

A(ax + fiy) = a Ax + ffAy, 

Prove that each of the correspondences described below is a linear transformation 
in this generalized sense. 

(a) U is the field of scalars of 'll; A is a linear functional on 'll. 

(b) 'll is the direct sum of *0 with some other space; A maps each pair in 'll onto 
its first coordinate. 

(c) *U is the quotient of 'll modulo a subspace; A maps each vector in 'll onto 
the coset it determines. 
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(d) Let w be a bilinear functional on a direct sum Hi© Do. Let D be the dual 
of Do, and define A to be the correspondence that assigns to each xo in C U the linear 
functional on Do obtained from w by setting its first argument equal to xq. 

4. (a) Suppose that 'll and D are vector spaces over the same field. If A and 
B are linear transformations from 'll to D, if a and ft are scalars, and if 

Cx = otAx + (3Bx 

for each x in 'll, then C is a linear transformation from 'll to D. 

(b) If we write, by definition, C — ocA + fiB, then the set of all linear trans- 
formations from 'll to D becomes a vector space with respect to this definition of 
the linear operations. 

(c) Prove that if 'll and D are finite-dimensional, then so is the space of all linear 
transformations from 'll to D, and find its dimension. 

5. Suppose that 9fTC is an m-dimensional subspace of an n-dimensional vector 
space D. Prove that the set of those linear transformations A on D for which 
Ax = 0 whenever a: is in SHI is a subspace of the set of all linear transformations on 
D, and find the dimension of that subspace. 

§ 34. Products 

The product P of two linear transformations A and B, P = AB } is de- 
fined by the equation Px = A(Bx). 

The notion of multiplication is fundamental for all that follows. Before 
giving any examples to illustrate the meaning of transformation products, 
let us observe the implications of the symbolism, P = AB. To say that 
P is a transformation means, of course, that given a vector x, P does some- 
thing to it. What it does is found out by operating on x with B , that is, 
finding Bx , and then operating on the result with A. In other words, if 
we look on the symbol for a transformation as a recipe for performing a 
certain act, then the symbol for the product of two transformations is to 
be read from right to left. The order to transform by AB means to trans- 
form first by B and then by A. This may seem like an undue amount of 
fuss to raise about a small point; however, as we shall soon see, transforma- 
tion multiplication is, in general, not commutative, and the order in 
which we transform makes a lot of difference. 

The most notorious example of non-commutativity is found on the 
space (P. We consider the differentiation and multiplication transforma- 

dx 

tions D and T y defined by ( Dx)(t ) — — and ( Tx)(t ) — tx(t); we have 

dt 

d dx 

(DTx) (t) = - (tx(t)) = x(t) + t — 
at at 


and 


(TDx)(t) - <-• 
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In other words, not only is it false that DT — TD (so that DT — TD = 0), 
but, in fact, (DT — TD)x — x for every x, so that DT — TD — 1. 

On the basis of the examples in § 32, the reader should be able to con- 
struct many examples of pairs of non-commutative transformations. Those 
who are used to thinking of linear transformations geometrically can, for 
example, readily convince themselves that the product of two rotations of 
(R 3 (about the origin) depends in general on the order in which they are 
performed. 

Most of the formal algebraic properties of numerical multiplication (with 
the already mentioned notable exception of commutativity) are valid in 
the algebra of transformations. Thus we have 


(1) 

40 

1! 

2 

II 

© 

(2) 

41 

= 14 = 4, 

(3) 

4(5 + O 

= 45 + AC, 

(4) 

(4 + B)C 

= AC + BC, 

(5) 

A(BC) 

= (45)C. 


The proofs of all these identities are immediate consequences of the defini- 
tions of addition and multiplication; to illustrate the principle we prove (3), 
one of the distributive laws. The proof consists of the following computa- 
tion: 

(A(B + C))x = A((B + Ox) - A(Bx + Cx) 

- A(Bx) + A(Cx) - (AB)x + (AC)x 
= (AB + AO x. 


§ 35. Polynomials 

The associative law of multiplication enables us to write the product of 
three (or more) factors without any parentheses; in particular we may 
consider the product of any finite number, say, m, of factors all equal to 
This product depends only on A and on m (and not, as we just re- 
marked, on any bracketing of the factors) ; we shall denote it by A m . The 
justification for this notation is that, although in general transformation 
multiplication is not commutative, for the powers of one transformation 
we do have the usual laws of exponents, A n A m = A n+m and (A n ) m = A nm . 
We observe that A 1 = A; it is customary also to write, by definition, 
^°*1- With these definitions the calculus of powers of a single trans- 
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formation is almost exactly the same as in ordinary arithmetic. We may, 
in particular, define polynomials in a linear transformation. Thus if p 
is any polynomial with scalar coefficients in a variable t, say p(t) — a 0 
+ ait H }-a»J n , we may form the linear transformation 

p{A) — a 0 1 + a x A -1 h a n A n . 


The rules for the algebraic manipulation of such polynomials are easy. 
Thus p{t)q(t) = r(i) implies p(A)q(A) — r(A) (so that, in particular, any 
p(A) and q(A) are commutative); if p(t) — a (identically), we shall usually 
write p(A) = a (instead of p(A) = a-l); this is in harmony with the use 
of the symbols 0 and 1 for linear transformations. 

If p is a polynomial in two variables and if A and B are linear transforma- 
tions, it is not usually possible to give any sensible interpretation to p(A, B). 
The trouble, of course, is that A and B may not commute, and even a simple 
monomial, such as sH , will cause confusion. If p(s, t) = s% what should 
we mean by p{A, B)? Should it be A 2 B , or ABA , or BA 2 ? It is important 
to recognize that there is a difficulty here; fortunately for us it is not neces- 
sary to try to get around it. We shall work with polynomials in several 
variables only in connection with commutative transformations, and then 
everything is simple. We observe that if AB = BA, then A n B m = B m A n , 
and therefore p{A, B) has an unambiguous meaning for every polynomial 
p. The formal properties of the correspondence between (commutative) 
transformations and polynomials are just as valid for several variables as 
for one; we omit the details. 

For an example of the possible behavior of the powers of a transformation 
we look at the differentiation transformation D on (P (or, just as well, on 
<P n , for some n). It is easy to see that for every positive integer k, and for 

every polynomial x in CP, we have ( D k x)(t ) = . We observe that what- 

ever else D does, it lowers the degree of the polynomial on which it acts 
by exactly one unit (assuming, of course, that the degree is ^ 1). Let 
x be a polynomial of degree n — 1, say; what is D n x ? Or put it another 
way: what is the product of the two (commutative) transformations D k 
and (where k is any integer between 0 and n), considered on the 

space (P n ? We mention this example to bring out the disconcerting fact 
implied by the answer to the last question; the product of two transforma- 
tions may vanish even though neither one of them is zero. A non-zero 
transformation whose product with some non-zero transformation is zero 
is called a divisor of zero . 
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EXERCISES 


1. Calculate the linear transformations D n S n and S n D n , n = 1, 2, 3, •••; in 
other words, compute the effect of each such transformation on an arbitrary ele- 
ment of (P. (Here D and S denote the differentiation and integration transforma- 
tions defined in § 32.) 


2. If A and B are linear transformations such that AB — BA commutes with 
A, then A k B - BA k = kA k ~\AB - BA) for every positive integer k. 


3. Suppose that Ax(t) = x(t + 1) for every x in (P n ; prove that if D is the dif- 
ferentiation operator, then 


D D 2 

1+ l! + 2! + - 


, P - 1 =A 

+ (n - D! • 


4. (a) If A is a linear transformation on an nndimensional vector space, then 
there exists a non-zero polynomial p of degree ^ n 2 such that p(A) — 0. 

(b) If Ax - yo(x)xo (see § 32, (2)), find a non-zero polynomial p such that p(A) 
= 0. What is the smallest possible degree p can have? 

5. The product of linear transformations between different vector spaces is 
defined only if they “match” in the following sense. Suppose that 01, *0, and 
are vector spaces over the same field, and suppose that A and B are linear trans- 
formations from 01 to V and from V to respectively. The product C — BA 
(the order is important) is defined to be the linear transformation from 01 to W 
given by Cx = B(Ax). Interpret and prove as many as possible among the equa- 
tions § 34, (l)-(5) for this concept of multiplication. 

6. Let A be a linear transformation on an n-dimensional vector space *0. 

(a) Prove that the set of all those linear transformations B on V for which 
AB — 0 is a subspace of the space of all linear transformations on V. 

(b) Show that by a suitable choice for A the dimension of the subspace de- 
scribed in (a) can be made to equal 0, or n, or n 2 . What values can this dimension 
attain? 

(c) Can every subspace of the space of all linear transformations be obtained 
in the manner described in (a) (by the choice of a suitable A )? 

7. Let A be a linear transformation on a vector space *0, and consider the cor- 
respondence that assigns to each linear transformation X on the linear transforma- 

tion AX . Prove that this correspondence is a linear transformation (on the space 
of all linear transformations). Can every linear transformation on that space be 
obtained in this manner (by the choice of a suitable A)*l 


§ 36, Inverses 

In each of the two preceding sections we gave an example; these two 
examples bring out the two nasty properties that the multiplication of 
linear transformations has, namely, non-commutativity and the existence 
of divisors of zero. We turn now to the more pleasant properties that 
linear transformations sometimes have. 
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It may happen that the linear transformation A has one or both of the 
following two very special properties. 

(i) If xi x 2 , then Ax\ Ax 2 . 

(ii) To every vector y there corresponds (at least) one vector x such that 
Ax = y. 

If ever A has both these properties we shall say that A is invertible. If 
A is invertible, we define a linear transformation, called the inverse of 
A and denoted by A* 1 , as follows. If yo is any vector, we may (by (ii)) 
find an x 0 for which Ax 0 = yo- This x 0 is, moreover, uniquely determined, 
since x 0 x x implies (by (i)) that y 0 = Ax 0 ^ Axi. We define A~ x y 0 
to be x 0 . To prove that A~~ l is linear, we evaluate A~ 1 (a i yi + a 2 y 2 ). If 
Ax i = yi and Ax 2 — y 2 , then the hnearity of A tells us that A (a x xi + a 2 x 2 ) 
= otiyx + a 2 y 2 , so that A _1 (<*i2/i + <* 22 / 2 ) = otiX X + a 2 x 2 = + 

a 2 A- x y 2 . 

As a trivial example of an invertible transformation we mention the 
identity transformation 1; clearly l"" 1 = 1. The transformation 0 is not 
invertible; it violates both the conditions (i) and (ii) about as strongly as 
they can be violated. 

It is immediate from the definition that for any invertible A we have 
AA _1 - A~ l A = 1; 

we shall now show that these equations serve to characterize A~~ x . 

Theorem 1. If A, B, and C are linear transformations such that 

AB = CA — 1, 

then A is invertible and A~ Y — B — C. 

proof. If Ax x — Ax 2} then CAx x = CAx 2) so that (since CA = 1) 
Xi = x 2 ; in other words, the first condition of the definition of invertibility 
is satisfied. The second condition is also satisfied, for if y is any vector and 
x = By, then y — ABy — Ax. Multiplying AB = 1 on the left, and 
CA = 1 on the right, by A~~ l , we see that A -1 = B — C. 

To show that neither AB — 1 nor CA = 1 is, by itself, sufficient to 
ensure the invertibility of A, we call attention to the differentiation and 
integration transformations D and S , defined in § 32, (4) and (5). Although 
DS = 1, neither D nor S is invertible; D violates (i), and S violates (ii). 

In finite-dimensional spaces the situation is much simpler. 

Theorem 2. A linear transformation A on a finite-dimensional vector 

space *U is invertible if and only if Ax — 0 implies that x = 0, or, al- 
ternatively , if and only if every y in V can be written in the form y =» Ax. 
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proof. If A is invertible, both conditions are satisfied; this much is 
trivial. Suppose now that Ax = 0 implies that x = 0. Then u ^ v, 
that is, u — v 0, implies that A(u — v) ^ 0, that is, that Au ^ Av; 
this proves (i). To prove (ii), let {xx, • • •, x„} be a basis in *0 ; we assert 
that {Ax i, •••, Ax„} is also a basis. According to §8, Theorem 2, we 
need only prove linear independence. But a.Ax,- = 0 means A (2» «<*,-) 
= 0, and, by hypothesis, this implies that “»£* = 0; the linear in- 
dependence of the Xi now tells us that «x = • • • = <*„ = 0. It follows, of 
course, that every vector y may be written in the form y = a, -Ax,- 

“ A i (XiXi ). 

Let us assume next that every y is an Ax, and let [y-i, •••, y n | be any 
basis in 13. Corresponding to each y ,• we may find a (not necessarily unique) 
Xi for which y { = Ax,-; we assert that {xx, • • •, x„} is also a basis. For 
otiXi = 0 implies «,Ax< = a«V» = 0, so that a x = ••• = «„ = 0. 
Consequently every x may be written in the form x = £»' «<*», and Ax = 0 
implies, as in the argument just given, that x = 0. 

Theorem 3. If A and B are invertible, then AB is invertible and (AB )~ 1 

= B~ l A ~ l . If A is invertible and a 0, then aA is invertible and («A) — 1 

= -A -1 . If A is invertible, then A -1 is invertible and (A -1 ) -1 = A. 
a 

proof. According to Theorem 1, it is sufficient to prove (for the first 
statement) that the product of AB with B~ l A~ l , in both orders, is the 
identity; this verification we leave to the reader. The proofs of both the 
remaining statements are identical in principle with this proof of the first 
statement; the last statement, for example, follows from the fact that the 
equations AA -1 = A -1 A = 1 are completely symmetric in A and A -1 . 

We conclude our discussion of inverses with the following comment. 
In the spirit of the preceding section we may, if we like, define rational 
functions of A, whenever possible, by using A -1 . We shall not find it 
useful to do this, except in one case: if A is invertible, then we know that 
A" is also invertible, n = 1, 2, • • • ; we shall write A -n for (A")“\ so that 
A~» = (A -1 )". 

EXERCISES 

1. Which of the linear transformations described in § 33, Ex. 1 are invertible? 

2. A linear transformation A is defined on G 2 by 

A(£i, |j) = (afx -|- /S£j, 7$i 

where a, 0, y, and 5 are fixed scalare. Prove that A is invertible if and only if ai - 
py s* 0. 
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3. If A and B are linear transformations (on the same vector space), then a 
necessary and sufficient condition that both A and B be invertible is that both AB 
and BA be invertible. 


4. If A and B are linear transformations on a finite-dimensional vector space, 
and if AB = 1, then both A and B are invertible. 


5. (a) If A, B, C y and D are linear transformations (all on the same vector space), 
and if both A + B and A — B are invertible, then there exist linear transforma- 
tions X and Y such that 


and 


AX + BY = C 
BX + AY = D. 


(b) To what extent are the invertibility assumptions in (a) necessary? 

6. (a) A linear transformation on a finite-dimensional vector space is invertible 
if and only if it preserves linear independence. To say that A preserves linear in- 
dependence means that whenever 9C is a linearly independent set in the space V 
on which A acts, then A9C is also a linearly independent set in V. (The symbol 
AX denotes, of course, the set of all vectors of the form Ax, with x in 9C.) 

(b) Is the assumption of finite-dimensionality needed for the validity of (a)? 

7. Show that if A is a linear transformation such that A 2 — A + 1 — 0, then 
A is invertible. 

8. If A and B are linear transformations (on the same vector space) and if AB 
« 1, then A is called a left inverse of B and B is called a right inverse of A . Prove 
that if A has exactly one right inverse, say B , then A is invertible. (Hint: consider 
BA + B - 1.) 

9. If A is an invertible linear transformation on a finite-dimensional vector 
space *0, then there exists a polynomial p such that A -1 = p(A). (Hint: find a 
non-zero polynomial q of least degree such that q(A) = 0 and prove that its constant 
term cannot be 0.) 

10. Devise a sensible definition of invertibility for linear transformations from 
one vector space to another. Using that definition, decide which (if any) of the 
linear transformations described in § 33, Ex. 3 are invertible. 


§ 37. Matrices 

Let us now pick up the loose threads; having introduced the new concept 
of linear transformation, we must now find out what it has to do with the 
old concepts of bases, linear functionals, etc. 

One of the most important tools in the study of linear transformations 
on finite-dimensional vector spaces is the concept of a matrix. Since this 
concept usually has no decent analogue in infinite-dimensional spaces, 
and since it is possible in most considerations to do without it, we shall try 
not to use it in proving theorems. It is, however, important to know what 
a matrix is; we enter now into the detailed discussion. 
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Definition. Let V be an n-dimensional vectOT space, let 9C — {xi, 
x n \ be any basis of *0, and let A be a linear transformation on V. Since 
every vector is a linear combination of the £,*, we have in particular 

A-Xj = OLijXi 

for j = 1, • • •, n. The set (a*y) of n 2 scalars, indexed with the double 
subscript i, j, is the matrix of A in the coordinate system 3C; we shall 
generally denote it by [A], or, if it becomes necessary to indicate the 
particular basis 9C under consideration, by [A; 9C], A matrix (a,y) is 
usually written in the form of a square array: 

an ot\2 • • • «i» 

<*21 <*22 • • * «2 ft 

fA] = • 

• • • 

-<*nl <**2 * * * <*»» 

the scalars (a,i, • • *, a* n ) form a rotP, and (cq/, • • *, a n j) a column , of [A], 

This definition does not define “matrix”; it defines “the matrix associated 
under certain conditions with a linear transformation.” It is often useful 
to consider a matrix as something existing in its own right as a square 
array of scalars; in general, however, a matrix in this book will be tied up 
with a linear transformation and a basis. 

We comment on notation. It is customary to use the same symbol, 
say, A, for the matrix as for the transformation. The justification for 
this is to be found in the discussion below (of properties of matrices). 
We do not follow this custom here, because one of our principal aims, in 
connection with matrices, is to emphasize that they depend on a coordinate 
system (whereas the notion of linear transformation does not), and to 
study how the relation between matrices and linear transformations changes 
as we pass from one coordinate system to another. 

We call attention also to a peculiarity of the indexing of the elements 
® a of a matrix [A]. A basis is a basis, and so far, although we usually 
indexed its elements with the first n positive integers, the order of the 
elements in it was entirely immaterial. It is customary, however, when 
speaking of matrices, to Tefer to, say, the first row or the first column. 
This language is justified only if we think of the elements of the basis 9C 
as arranged in a definite order. Since in the majority of our considerations 
the order of the rows and the columns of a matrix is as irrelevant as the 
order of the elements of a basis, we did not include this aspect of matrices 
in our definition. It is important, however, to realize that the appearance 
of the square array associated with [A] varies with the ordering of 9C. 



66 


TRANSFORMATIONS 


Sec. 37 


Everything we shall say about matrices can, accordingly, be interpreted 
from two different points of view; either in strict accordance with the 
letter of our definition, or else following a modified definition which makes 
correspond a matrix (with ordered rows and columns) not merely to a 
linear transformation and a basis, but also to an ordering of the basis. 

One more word to those in the know. It is a perversity not of the author, 
but of nature, that makes us write 

Axj = aijXi , 

instead of the more usual equation 

AX{ ~ j (XijXj. 

The reason is that we want the formulas for matrix multiplication and for 
the application of matrices to numerical vectors (that is, vectors (£ x , • • •, 
f n ) in e n ) to appear normal, and somewhere in the process of passing from 
vectors to their coordinates the indices turn around. To state our rule 
explicitly: write Axj as a linear combination of x lf * * *, x n , and write the 
coefficients so obtained as the j-th column of the matrix [A]. (The first 
index on is always the row index; the second one, the column index.) 

For an example we consider the differentiation transformation D on 
the space <P n , and the basis {x t , •••, x n ] defined by Xi(t) = t*" 1 , i — 1, 

• * * , n. What is the matrix of D in this basis? We have 

Dx i = Ch?! + 0#2 + • • • + Ox n _i + Qx n 

Dx 2 = lxx + 0^2 + • • • + 0x n _i + 0:c n 

Dx 3 = 0#i + 2x 2 d b 0x „ — i + 0x n 


Dx n — Oxi + 0^2 + • • * + (n — l)x n _i + 0x n , 

so that 

ro io-o oi 

0 0 2 ••• 0 0 


( 2 ) 


[D] - 


0 0 0 

.0 0 0 


0 n — 1 
0 0 . 


The unpleasant phenomenon of indices turning around is seen by comparing 
(I) and (2). 



Seg. 38 


MATRICES OF TRANSFORMATIONS 


67 


§ 38* Matrices of transformations 

There is now a certain amount of routine work to be done, most of which 
we shall leave to the imagination. The problem is this : in a fixed coordinate 
system 9C = {xj, • • *, x n }, knowing the matrices of A and B f how can we 
find the matrices of aA + fiB, of AB , of 0, 1, etc.? 

Write [A] = (ay), [B] = (fty), C ~ aA + fiB } [C] = (7^); we assert that 

7 ij = + ftfy; 

also if [0] = (0^) and [1] = (e#), then 
0.7 = 0 

and 

e»y = (= the Kronecker delta). 

A more complicated rule is the following: if C = AB, [C] — (7^), then 

7*7 = 2 * 

To prove this we use the definition of the matrix associated with a trans- 
formation, and juggle, thus: 

Cxj = A (Bxj) — A Pkj%k) ~ ^ fikjAxj* 

= S* &fci(S* <*ikXi) = S» (S* «afiy)s<- 

The relation between transformations and matrices is exactly the same 
as the relation between vectors and their coordinates, and the analogue 
of the isomorphism theorem of § 9 is true in the best possible sense. We 
shall make these statements precise. 

With the aid of a fixed basis 9C, we have made correspond a matrix [A] 
to every linear transformation A ; the correspondence is described by the 
relations Axj — a^x,-. We assert now that this correspondence is 

one-to-one (that is, that the matrices of two different transformations are 
different), and that every array (a*y) of n 2 scalars is the matrix of some 
transformation. To prove this, we observe in the first place that knowledge 
of the matrix of A completely determines A (that is, that Ax is thereby 
uniquely defined for every x), as follows: if x — £yxy, then Ax = 
Si £yAxy = ]Ci £i(S»* a iJ x i) = Si (Si «iifi)^i* (In other words, if 
V = Ax = then 

= $^i a *i?i* 

Compare this with the comments in § 37 on the perversity of indices.) 
In the second place, there is no law against reading the relation Axj — 
S» a <j x i backwards. If, in other words, («*•/) is any array, we may use 
this relation to define a linear transformation A ; it is clear that the matrix 
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of A will be exactly (a t *y). (Once more, however, we emphasize the funda- 
mental fact that this one-to-one correspondence between transformations 
and matrices was set up by means of a particular coordinate system, and 
that, as we pass from one coordinate system to another, the same linear 
transformation may correspond to several matrices, and one matrix 
may be the correspondent of many linear transformations.) The follow- 
ing statement sums up the essential part of the preceding discussion. 

Theorem. Among the set of all matrices (a*y), (&y), etc. y i f j =* 1, • • •, n 
( not considered in relation to linear transformations ) , we define sum , scalar 
multiplication , product , (o t *y), and (e»y), by 

(«»;) + (fiij) = («V + Pij)t 
a(ayy) = (aayy), 

= (Hk otik&kj), 

Oij = 0, 6*y = 5 t *y. 

Then the correspondence (established by means of an arbitrary coordinate 
system 9C = {xx, •**, x n ] of the n-dimensional vector space *U), between 
all linear transformations A on V and- all matrices (a»y), described by 
Axj — aijXi , is an isomorphism ; in other words , it is a one-to-one cor- 

respondence that preserves sum , scalar multiplication , product , 0, and 1. 

We have carefully avoided discussing the matrix of A -1 . It is possible 
to give an expression for [A” 1 ] in terms of the elements a t *y of [A], but the 
expression is not simple and, fortunately, not useful for us. 


EXERCISES 

1. Let A be the linear transformation on (P n defined by (Ax)(t) = x(t + 1), and 
let {xo, • • *, x n _i) be the basis of <P« defined by xy (t) = V, j = 0, • • •, n — 1. Find 
the matrix of A with respect to this basis. 

2. Find the matrix of the operation of conjugation on (3, considered as a real 
vector space, with respect to the basis { 1, t } (where t = V"— 1 ). 

3. (a) Let w be a permutation of the integers 1, • • •, n; if x = (&, •••, £«) is 

a vector in <3*, write Ax = (f T( i), • • £,(„)). If Xi = (8n, • • 8 in ), find the matrix 

of A with respect to {xi, • • •, x n J. 

(b) Find all matrices that commute with the matrix of A. 

4. Consider the vector space consisting of all real two-by-two matrices and let A 
be the linear transformation on this space that sends each matrix X onto PX, where 

M! \) Find the matrix of A with respect to the basis consisting of ^ ^ • 

/0 1\ /0 0\ /0 0\ 
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5. Consider the vector space consisting of all linear transformations on a vector 
space *0, and let A be the (left) multiplication transformation that sends each trans- 
formation X on V onto PX, where P is some prescribed transformation on U. 
Under what conditions on P is A invertible? 

6. Prove that if /, J, and K are the complex matrices 

(-J DC DQ -!) 

respectively (where i = V— 1), then I 2 = J 2 = K 2 = — 1, IJ = —J1 — K, 
JK = -KJ = I, and KI = - IK « /. 

7. (a) Prove that if A, B, and C are linear transformations on a two-dimensional 
vector space, then ( AB — BA) 2 commutes with C. 

(b) Is the conclusion of (a) true for higher-dimensional spaces? 

8. Let A be the linear transformation on & defined by A(£ i, { 2 ) = (£1 + £ 2 , 
£2). Prove that if a linear transformation B commutes with A, then there exists 
a polynomial p such that B = p(A). 

9. For which of the following polynomials p and matrices A is it true that p(A) = 

0 ? 

(a) p(t) « t* - St 2 + 3* - 1 A *= 


(1 1 IV 

(b) p(t) = t* - 3t, A = I 1 1 1 )• 

\l 1 1/ 

(c) p(t) = t z + t 2 + t+l,A = (l 

\0 

/° 1 °\ 

(d) p(t) = t z - 2t, A = (l 0 lj* 

\0 1 0 / 

10. Prove that if A and B are the complex matrices 


"0 

1 

0 

O' 


' 1 

0 

0 

0“ 

0 

0 

1 

0 

and 

0 

-1 

0 

0 

0 

0 

0 

1 

0 

0 

— i 

0 

_1 

0 

0 

0- 


.0 

0 

0 

1- 


respectively (where i — ), and if C = AB — tPA, then C* + C 2 + C = 0. 

11. If A and B are linear transformations on a vector space, and if AB = 0, 
does it follow that BA = 0? 

12. What happens to the matrix of a linear transformation on a finite-dimensional 
vector space when the elements of the basis with respect to which the matrix i® 
computed are permuted among themselves? 


1 1 

011J. 
0 0 1 / 


:?)• 
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13. (a) Suppose that V is a finite-dimensional vector space with basis {x h • • •, 
x n ) . Suppose that a h ■••,«« are pairwise distinct scalars. If A is a linear trans- 
formation such that Axj = ajXj, j = 1, * * *, n, and if B is a linear transformation 
that commutes with A , then there exist scalars fii, • • •, fi n such that Bx, = fijXj. 

(b) Prove that if B is a linear transformation on a finite-dimensional vector 
space *0 and if B commutes with every linear transformation on *0, then B is a 
scalar (that is, there exists a scalar fi such that Bx = fix for all x in *U). 

14. If {xi, • • *, Xk] and {t/i, •••,?/*} are linearly independent sets of vectors in 
a finite-dimensional vector space “0, then there exists an invertible linear trans- 
formation A on T) such that Ax, = t/y, j = 1, • • *, k, 

15. If a matrix [4] = (ay) is such that an = 0, % = 1, • • *, n, then there exist 
matrices [B] - (fin) and [C] — (yy) such that [A] = [£][C] — [C][B] t (Hint: try 
fit} ~ fii$ij') 

16. Decide which of the following matrices are invertible and find the inverses 
of the ones that are. 

« C !)• 

« (! :)• 

<°> C D- 

< o> 

17. For which values of a are the following matrices invertible? Find the in- 
verses whenever possible. 

<*> c i> <»> c :)• 

® (! »)• 

18. For which values of a are the following matrices invertible? Find the in- 


verses 

whenever possible. 







a 

°\ 


/° 

1 

a\ 

I 

(a) 

a 

1 

« • 

(c) I 

1 

a 

°) 

} 

Ko 

a 

1/ 


Xa 

0 

1/ 

i 

{<* 

1 

0\ 


/I 

1 

1\ 

(b) 

l 

a 

1 • 

(d)| 


1 

« 


\0 

1 

a) 


Xl 

a 

1/ 


19. (a) It is easy to extend matrix theory to linear transformations between 
different vector spaces. Suppose that It and V are vector spaces over the same 



/° 1 Ox 
( e )(0 0 1 ). 
\l 0 0/ 

A 0 IX 
<0(1 0 l )• 
\l 0 1/ 

/° 1 OX 
g)(l 0 1 . 

\0 1 0 / 
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field, let [x\ y •**, Zn] and \y x , t/ m ) be bases of <11 and V respectively, and 

let i be a linear transformation from *11 to V. The matrix of A is, by definition, 
the rectangular, m by », array of scalars defined by 

Axj = otijUu 

Define addition and multiplication of rectangular matrices so as to generalize as 
many as possible of the results of § 38 . (Note that the product of an mi by n\ 
matrix and an m2 by n 2 matrix, in that order, will be defined only if n x = m 2 .) 

(b) Suppose that A and B are multipliable matrices. Partition A into four 
rectangular blocks (top left, top right, bottom left, bottom right) and then partition 
B similarly so that the number of columns in the top left part of A is the same as 
the number of rows in the top left part of B. If, in an obvious shorthand, these 
partitions are indicated by 

A — f^ n R = (^ n 

va 2 i aJ* \b 21 bJ' 

then 

AB = (AnBn + AuB 2 i A XX B X2 + A i 2 £ 22 \ ^ 

\^. 2 i5ii + A22B2 1 A 21B12 + A22B22* 

(c) Use subspaces and complements to express the result of (b) in terms of 
linear transformations (instead of matrices). 

(d) Generalize both (b) and (c) to larger numbers of pieces (instead of four). 


§ 39. Invariance 

A possible relation between subspaces DTI of a vector space and linear 
transformations A on that space is invariance. We say that DU is invariant 
under A if x in DU implies that Ax is in DTI. (Observe that the implication 
relation is required in one direction only; we do not assume that every 
y in DTI can be written in the form y — Ax with a; in D1Z; we do not even 
assume that Ax in DU implies x in DTI. Presently we shall see examples in 
which the conditions we did not assume definitely fail to hold.) We 
know that a subspace of a vector space is itself a vector space; if we know 
that DTI is invariant under A, we may ignore the fact that A is defined 
outside DTI and we may consider A as a linear transformation defined on 
the vector space DTI. Invariance is often considered for sets of linear 
transformations, as well as for a single one; DTI is invariant under a set if 
it is invariant under each member of the set. 

What can be said about the matrix of a linear transformation A on an 
n-dimensional vector space V if we know that some DTI is invariant under 
A ? In other words: is there a clever way of selecting a basis DC = \x Xf • • • , 
x n ) in *0 so that [A] = [A; DC] will have some particularly simple form? 
The answer is in § 12, Theorem 2; we may choose DC so that xi, * • ■, x m 
are in DTI and x m +i, • • • , x n are not. Let us express Axj in terms of x lf • • • , x n . 
For m + 1 ^ j ^ n, there is not much we can say: Axj = For 
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1 ^ j ^ m, however, Xj is in 9TC, and therefore (since 9TO is invariant under 
A )Axj is in 911. Consequently, in this case Axj is a linear combination of 
xu ' "f x my the an with m + 1 ^ i ^ n are zero. Hence the matrix [A] 
of A, in this coordinate system, will have the form 



where [Ai] is the (m -rowed) matrix of A considered as a linear transforma- 
tion on the space 9TC (with respect to the coordinate system { x\ , • • •, x m )) 9 
[A 2 ] and [B 0 ] are some arrays of scalars (in size (n — m) by (n — m) and 
m by (n — m) respectively), and [0] denotes the rectangular ((n — m) by m) 
array consisting of zeros only. (It is important to observe the unpleasant 
fact that [B 0 ] need not be zero.) 

§ 40. Reducibility 

A particularly important subcase of the notion of invariance is that of 
reducibility. If 9TC and 91 are two subspaces such that both are invariant 
under A and such that is their direct sum, then A is reduced (decomposed) 
by the pair (9TC, 91). The difference between invariance and reducibility 
is that, in the former case, among the collection of all subspaces invariant 
under A we may not be able to pick out any two, other than 0 and *0, with 
the property that V is their direct sum. Or, saying it the other way, if 
9TI is invariant under A, there are, to be sure, many ways of finding an 
91 such that V = 9E © 91, but it may happen that no such 91 will be in- 
variant under A. 

The process described above may also be turned around. Let 9TC and 
91 be any two vector spaces, and let A and B be any two linear transforma- 
tions (on 9TC and 91 respectively). Let *0 be the direct sum 9E © 91; we 
may define onDa linear transformation C called the direct sum of A and 
B y by writing 

Cz = C(x, y) = ( Ax , By). 

We shall omit the detailed discussion of direct sums of transformations; 
we shall merely mention the results. Their proofs are easy. If (9E, 91) 
reduces <7, and if we denote by A the linear transformation C considered 
on 9TC alone, and by B the linear transformation C considered on 91 alone, 
then C is the direct sum of A and B. By suitable choice of basis (namely, 
by choosing x\, • • *, x m in 9E and x m +i, • • •, x n in 91) we may put the 
matrix of the direct sum of A and B in the form displayed in the preceding 
section, with [Ai] = [A], [Bo] = [0], and [A 2 ] = [B]. If p is any poly- 
nomial, and if we write A' = p(A), B' = p(B), then the direct sum C' 
of A' and B' will be p(C). 
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EXERCISES 

L Suppose that the matrix of a linear transformation (on a two-dimensiona 
vector space) with respect to some coordinate system is ^ ^ . How many sub- 
spaces are there invariant under the transformation? 

2. Give an example of a linear transformation A on a finite-dimensional vector 
space D such that 0 and *0 are the only subspaces invariant under A. 

3. Let D be the differentiation operator on (P n . If m ^ n, then the subspace 
(P m is invariant under D . Is D on (P m invertible? Is there a complement of <P m 
in (P n such that it together with <P m reduces D? 

4. Prove that the subspace spanned by two subspaces, each of which is invariant 
under some linear transformation A, is itself invariant under A. 

§ 41. Projections 

Especially important for our purposes is another connection between 
direct sums and linear transformations. 

Definition. If *1) is the direct sum of 911 and 91, so that every z in *0 
may be written, uniquely, in the form z — x + y, with x in 9E and y 
in 91, the projection on 9TC along 91 is the transformation E defined by 
Ez = x . 

If direct sums are important, then projections are also, since, as we shall 
see, they are a very powerful algebraic tool in studying the geometric 
concept of direct sum. The reader will easily satisfy himself about the 
reason for the word “projection” by drawing a pair of axes (linear manifolds) 
in the plane (their direct sum). To make the picture look general enough, 
do not draw perpendicular axes! 

We skipped over one point whose proof is easy enough to skip over, but 
whose existence should be recognized; it must be shown that E is a linear 
transformation. We leave this verification to the reader, and go on to 
look for special properties of projections. 

Theorem 1. A linear transformation E is a projection on some subspace 
if and only if it is idempotent y that is f E 2 = E. 

proof. If E is the projection on 9E along 91, and if z — x + y, with 
* in 9TC and y in 91, then the decomposition of x is x + 0, so that 

E 2 z = EEz = Ex = x = Ez. 

Conversely, suppose that E 2 = E. Let 91 be the set of all vectors z 
in *U for which Ez = 0; let 9TI be the set of all vectors z for which Ez = z. 
It is clear that both 9TC and 91 are subspaces; we shall prove that D = 9R 
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© 91. In view of the theorem of § 18, we need to prove that 9H and 91 
are disjoint and that together thej r span U. 

If z is in 9TC, then Ez — z; if z is in 91, then Ez = 0; hence if z is in both 
9TI and 91, then z = 0. For an arbitrary z we have 

z — Ez + (1 — E)z . 

If we write Ez = x and (1 — E)z = y , then Ex = E 2 z = Ez = x, and 
Ey = E{ 1 — E)z = Ez — E 2 z = 0, so that x is in 971 and y is in 91. This 
proves that *0 = 971 © 91, and that the projection on 911 along 91 is precisely 
E . 

As an immediate consequence of the above proof we obtain also the 
following result. 

Theorem 2. If E is the projection on 9TI along 91, then 911 and 91 are , 
respectively , the sets of all solutions of the equations Ez — z and Ez = 0 . 

By means of these two theorems we can remove the apparent asymmetry, 
in the definition of projections, between the roles played by 9H and 91. 
If to every z = x + y we make correspond not x but y f we also get an 
idempotent linear transformation. This transformation (namely, 1 — E) 
is the projection on 91 along 9TL We sum up the facts as follows. 

Theorem 3. A linear transformation E is a projection if and only if 
1 — E is a projection; if E is the projection on 9H along 91, then 1 — E 
is the projection on 91 along 371. 

§ 42. Combinations of projections 

Continuing in the spirit of Theorem 3 of the preceding section, we in- 
vestigate conditions under which various algebraic combinations of projec- 
tions are themselves projections. 

Theorem. We assume that E x and E 2 are projections on 9T!i and 9 E 2 
along 9lj and 97-2 respectively and that the underlying field of scalars is 
such that 1 + 1^0. We make three assertions. 

(i) Ei + E 2 is a projection if and only if E X E 2 — E 2 E X = 0 ; if this 
condition is satisfied , then E = E x + E 2 is the projection on 911 along 91, 
where 911 — 9JTi © 3U 2 and 91 = 97i fl 9l 2 . 

(ii) Ei — E 2 is a projection if and only if E X E 2 — E 2 E X = E 2 ; if 
this condition is satisfied, then E = E x — E 2 is the projection on 9TI along 
91, where 9TC = 9T£i fl 9l 2 and 31 = 9li © 31X 2 . 

(iii) If E x E 2 — E 2 E x = E f then E is the projection on 971 along 91, where 
9TI — 9TCi fl 3TC 2 and 91 = 9lj + 9X 2 . 
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proof. We recall the notation. If X and X are subspaces, then JC + JC 
is the subspace spanned by X and X; writing X ® X implies that X and 
X are disjoint, and then X © X = X + X; and X D X is the intersection 
of JC and X. 

(i) If Ei + E 2 = E is a projection, then (E x + E 2 ) 2 = E 2 = E = E\ 
+ E 2) so that the cross-product terms must disappear: 

(1) E X E 2 ~f* E 2 E 1 = 0. 

If we multiply (1) on both left and right by E X) we obtain 


E X E 2 + E X E 2 E X — 0, 

E X E 2 E X -j- E 2 E X = 0; 

subtracting, we get E X E 2 — E 2 E X = 0. Hence E x and E 2 are commutative, 
and (1) implies that their product is zero. (Here is wdiere we need the 
assumption 1 + 1^0.) Since, conversely, E X E 2 — E 2 E X = 0 clearly 
implies (1), we see that the condition is also sufficient to ensure that E 
be a projection. 

Let us suppose, from now on, that E is a projection; by § 41, Theorem 2, 
9ft and 91 are, respectively, the sets of all solutions of the equations Ez = z 
and Ez = 0. Let us write z — x x + y x = x 2 + y 2) where x x = E x z and 
x 2 — E 2 z are in and 9H 2 , respectively, and y x = (1 — E x )z and y 2 = 
(1 — E 2 )z are in and respectively. If z is in 911, E x z + E 2 z = z, then 

z = E x (x 2 + y 2 ) + E 2 (x x + ^ 1 ) = E x y 2 + E 2 y x . 

Since E x (E x y 2 ) = E x y 2 and E 2 (E 2 y x ) = E 2 y Xi we have exhibited z as a sum 
of a vector from 9Tli and a vector from 91Z 2 , so that 9ft C 9TCi + 9ft 2 . Con- 
versely, if z is a sum of a vector from 9fti and a vector from 9ft 2 , then 
(E 1 + E 2 )z = z, so that z is in 9ft, and consequently 9ft = 9fti + 9^2- 
Finally, if z belongs to both 9fti and 9ft 2 , so that E x z = E 2 z = z, then 
z = E x z = E x (E 2 z) = 0, so that 9fti and 9ft 2 are disjoint; we have proved 
that 9ft = 9fti © 9ft 2 . 

It remains to find 91, that is, to find all solutions of E x z + E 2 z = 0. If 
z is in 9li fl 9l 2 , this equation is clearly satisfied; conversely E x z + E 2 z = 0 
implies (upon multiplication on the left by E x and E 2 respectively) that 
E x z + E x E 2 z = 0 and E 2 E x z + E 2 z = 0. Since E x E 2 z = E 2 E x z « 0 for 
all z, we obtain finally E x z == E 2 z = 0, so that z belongs to both 9li and 9l 2 . 

With the technique and the results obtained in this proof, the proofs of 
the remaining parts of the theorem are easy. 

(ii) According to § 41, Theorem 3, E x - E 2 is a projection if and only 
if 1 — (E x — E 2 ) *= (1 — E x ) + E 2 is a projection. According to (i) 
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this happens (since, of course, 1 — E x is the projection on TO along TOi) 
if and only if 

(2) (i — E X )E 2 = # 2(1 “ E x ) “ 0, 

and in this case (1 — E x ) + E 2 is the projection on TO ® TO 2 along TO 1 
fl TO- Since (2) is equivalent to E{E 2 = E 2 E X = E 2 , the proof of (ii) 
is complete. 

(iii) That E — = # 2 #i implies that E is a projection is clear, 

since E is idempotent. We assume, therefore, that E x and E 2 commute 
and we find TO and TO If Ez = z, then E x z — E x Ez = E x E x E 2 z = E x E 2 z 
= z, and similarly E 2 z — z, so that z is contained in both TOi and TO 2 . 
The converse is clear; if E x z — z = E 2 z y then Ez = z. Suppose next that 
E x E 2 z = 0; it follows that E 2 z belongs to TO, and, from the commutativity 
of E x and E 2i that E x z belongs to TO- This is more symmetry than we 
need; since z = E 2 z + (1 — E 2 )z } and since (1 — E 2 )z is in TO, we have 
exhibited z as a sum of a vector from TO and a vector from TO- Conversely 
if z is such a sum, then E x E 2 z = 0; this concludes the proof that 91 = TO 
+ TO- 

We shall return to theorems of this type later, and we shall obtain, in 
certain cases, more precise results. Before leaving the subject, however, 
we call attention to a few minor peculiarities of the theorem of this section. 
We observe first that although in both (i) and (ii) one of TO and 91 was a 
direct sum of the given subspaces, in (iii) we stated only that 91 = TO + 912- 
Consideration of the possibility E x = E 2 = E shows that this is unavoid- 
able. Also: the condition of (iii) was asserted to be sufficient only; it is 
possible to construct projections E x and E 2 whose product E X E 2 is a projec- 
tion, but for which E X E 2 and E 2 E X are different. Finally, it may be con- 
jectured that it is possible to extend the result of (i), by induction, to more 
than two summands. Although this is true, it is surprisingly non-trivial; 
we shall prove it later in a special case of interest. 

§ 43. Projections and invariance 

We have already seen that the study of projections is equivalent to the 
study of direct sum decompositions. By means of projections we may also 
study the notions of invariance and reducibility. 

Theorem 1. If a subspace TO is invariant under the linear transformation 

A, then EAE = AE for every projection E on TO. Conversely , if EAE 

= AEfor some projection E on TO, then TO is invariant under A . 

proof. Suppose that TO is invariant under A and that *0 ~ TO ® 91 
for some 91; let E be the projection on TO along 91. For any z « x + y 
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(with x in 9H and y in 01) we have AEz = Ax and EAEz — EAx; since the 
presence of x in OH guarantees the presence of Ax in SHI, it follows that 
EAx is also equal to Ax , as desired. 

Conversely, suppose that V — OH © 01, and that EAE — AE for the 
projection E on OH along 01 . If x is in OH, then Ex = x , so that 

EAx — EAEx = AEx = Ax, 
and consequently Ax is also in OH. 

Theorem 2. If OH and 01 are subspaces with = OH © 01, then a neces- 
sary and sufficient condition that the linear transformation A be reduced 
by the pair (OH, 01) is that EA — AE, where E is the projection on OH 
along 01. 

proof. First we assume that EA = AE, and we prove that A is reduced 
by (OH, 01). If x is in OH, then Ax = AEx — EAx, so that Ax is also in 
OH; if x is in 01, then Ex = 0 and EAx = AEx = A0 = 0, so that Ax is 
also in 01. 

Next we assume that A is reduced by (OH, 01), and we prove that EA 
— AE. Since OH is invariant under A, Theorem 1 assures us that EAE 
= AE; since 01 is also invariant under A, and since 1 — E is a projection 
on 01, we have, similarly, (1 — E)A{ 1 — E) = A( 1 — E). From the 
second equation, after carrying out the indicated multiplications and 
simplifying, we obtain EAE — EA ; this concludes the proof of the theorem. 


EXERCISES 

1. (a) Suppose that E is a projection on a vector space V, and suppose that 
scalar multiplication is redefined so that the new product of a scalar a and a vector 
x is the old product of a and Ex. Show that vector addition (old) and scalar mul- 
tiplication (new) satisfy all the axioms on a vector space except l*x — x. 

(b) To what extent is it true that the method described in (a) is the only way to 
construct systems satisfying all the axioms on a vector space except l*x = x? 

2. (a) Suppose that *0 is a vector space, Xo is a vector in U, and t/ 0 is a linear 
functional on V; write Ax = [x, ?/o]xo for every x in 1). Under what conditions 
on xo and yo is A a projection? 

(b) If A is the projection on, say, 9H along 91, characterize 3H and 91 in terms 
of x 0 and yo. 

3. If A is left multiplication by P on a space of linear transformations (cf. § 38 
Ex. 5), under what conditions on P is A a projection? 

4. If A is a linear transformation, if E is a projection, and if F = 1 — E, then 

A = EAE + EAF + EAE + FAF. 

Use this result to prove the multiplication rule for partitioned (square) matrices 
(as in § 38, Ex. 19). 
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5. (a) If Ei and E 2 are projections on 9TCi and 91 h along 9li and 9l 2 respectively, 
and if E\ and E 2 commute, then E\ + E 2 — EiE 2 is a projection. 

(b) If Ex + Ei — E\E% is the projection on 911 along 91, describe 9TC and 91 in 
terms of 9TCi, 9 ft 2 , 3^i, and 9l 2 . 

6. (a) Find a linear transformation A such that A 2 (l — A) — 0 but A is not 
idempotent. 

(b) Find a linear transformation A such that A(1 — A) 2 = 0 but A is not 
idempotent. 

(c) Prove that if A is a linear transformation such that A\ 1 — A) — A( 1 — A) 2 
= 0, then A is idempotent. 

7. (a) Prove that if E is a projection on a finite-dimensional vector space, then 
there exists a basis 9C such that the matrix ( ey ) of E with respect to 9C has the fol- 
lowing special form : e# = 0 or 1 for all i and j, and e = 0 if i 9 * j. 

(b) An involution is a linear transformation U such that U 2 — 1. Show that 
if 1 + 1 ^ 0, then the equation U — 2E — 1 establishes a one-to-one correspond- 
ence between all projections E and all involutions U. 

(c) What do (a) and (b) imply about the matrix of an involution on a finite- 
dimensional vector space? 

8. (a) In the space & of all vectors (£i, £ 2 ) let 91t + , 9li, and 9I 2 be the subspaces 
characterized by £i = £ 2 , £i = 0, and f 2 = 0, respectively. If E\ and E 2 are the 
projections on 9TI+ along 9li and 9l 2 respectively, show that E\E 2 = E 2 and E 2 E\ 

= Elm 

(b) Let 9H- be the subspace characterized by £i = — £ 2 . If Eq is the projection 
on 9l 2 along 9E _ , then E 2 Eq is a projection, but EoE 2 is not. 

9. Show that if E , F, and G are projections on a vector space over a field whose 
characteristic is not equal to 2, and if E + F + G = 1, then EF = FE ~ EG 
= GE = FG ~ GF = 0. Does the proof work for four projections instead of three? 


§ 44. Ad joints 

Let us study next the relation between the notions of linear transforma- 
tion and dual space. Let V be any vector space and let y be any element 
of *0'; for any linear transformation A on T) we consider the expression 
[Ax f y]. For each fixed y , the function y ' defined by y'(x) = [Ax, y] is 
a linear functional on V ; using the square bracket notation for y' as well 
as for y, we have [Ax, y] = [x, y']. If now we allow y to vary over 
then this procedure makes correspond to each y a y f , depending, of course, 
on y; we write y ' = A'y, The defining property of A' is 

(1) [Ax, y ] = [x, A'y]. 

We assert that A' is a linear transformation on 'O'. Indeed, if y = a\y\ 
+ <* 22 / 2 , then 

[x, A'y ] = [Ax, y] = ai[Ax, y{\ + a 2 [Ax, y 2 ] 

— ai[x, A'yi] + a 2 [x, A'y 2 ] = [x, atyA'yi + oc 2 A’y 2 ], 



Sec. 44 


ADJOINTS 


79 


The linear transformation A' is called the adjoint (or dual) of A ; we dedicate 
this section and the next to studying properties of A'. Let us first get the 
formal algebraic rules out of the way; they go as follows. 


(2) 

© 

II 

© 

(3) 

1' = 1, 

(4) 

(A + BY = A' + B\ 

(5) 

N 

V ' 

II 

(6) 

OQ 

11 

A? 

(7) 

(A- 1 )' = (AO" 1 . 


Here (7) is to be interpreted in the following sense: if A is invertible, 
then so is A', and the equation is valid. The proofs of all these relations 
are elementary; to indicate the procedure, we carry out the computations 
for (6) and (7). To prove (6), merely observe that 

[ABx, y] - [Bx y A'y] = [x, B'A'y]. 

To prove (7), suppose that A is invertible, so that AA"" 1 = A“” 1 A = 1. 
Applying (3) and (6) to these equations, we obtain 

(A _1 )'A' = A'CA- 1 )' = 1; 

Theorem 1 of § 36 implies that A’ is invertible and that (7) is valid. 

In finite-dimensional spaces another important relation holds: 

(8) A" = A. 

This relation has to be read with a grain of salt. As it stands A" is a trans- 
formation not on *0 but on the dual space *0" of *0'. If, however, we identify 
V" and V according to the natural isomorphism, then A" acts on *0 and 
(8) makes sense. In this interpretation the proof of (8) is trivial. Since 
*0 is reflexive, we obtain every linear functional on V' by considering 
[x, y] as a function of y, with x fixed in Since [x, A'y ] defines a function 
(a linear functional) of y, it may be written in the form [x', y]. The vector 
x' here is, by definition, A"x. Hence we have, for every y in *0' and for 
every xml), 

[Ax, y] = [x, A'y] = [A"x, y]; 

the equality of the first and last terms of this chain proves (8). 

Under the hypothesis of (8) (that is, finite-dimensionality), the asjun- 
metry in the interpretation of (7) may be removed; we assert that in this 
case the invertibility of A' implies that of A and, therefore, the validity 
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of (7). Proof: apply the old interpretation of (7) to A' and A" in place 
of A and A'. 

Our discussion is summed up, in the reflexive finite-dimensional case, 
by the assertion that the mapping A — > A' is one-to-one, and, in fact, an 
algebraic anti-isomorphism, from the set of all linear transformations on 
*0 onto the set of all linear transformations on e U'. (The prefix “anti” got 
attached because of the commutation rule (6).) 

§ 45. Adjoints of projections 

There is one important case in which multiplication does not get turned 
around, that is, when ( AB) f = A'B'; namely, the case when A and B 
commute. We have, in particular, (A n )' = (A') n , and, more generally, 
(p(A))' = p{A') for every polynomial p. It follows from this that if E 
is a projection, then so is E The question arises: what direct sum de- 
composition is E f associated with? 

Theorem 1. If E is the projection on 9TZ along 91, then E f is the projection 
on 91° along 3TC°. 

proof. We know already that {E') 2 — E' and V' = 91° © 31Z° (cf. 
§ 20). It is necessary only to find the subspaces consisting of the solutions 
of E'y = 0 and E'y = y . This we do in four steps. 

(i) If y is in 311°, then, for all x, 

[x } E'y] = [Ex, y] = 0, 

so that E'y = 0. 

(ii) If E'y = 0, then, for all x in 9TC, 

[x, y] = [Ex, y] = [x, E'y] = 0, 

so that y is in 9H°. 

(iii) If y is in 91°, then, for all x, 

[x, y] = [Ex, y] + [(1 - E)x, y] = [Ex, y] = [x, E'y], 
so that E'y = y. 

(iv) If E'y = y , then for all x in 91, 

fa, y] = [x, E'y] = [Ex, y] = 0, 

so that y is in 91°. 

Steps (i) and (ii) together show that the set of solutions of E'y = 0 
is precisely 9E°; steps (iii) and (iv) together show that the set of solutions 
of E'y — y is precisely 91°. This concludes the proof of the theorem. 

Theorem 2. If 9TC is invariant under A , then 3R° is invariant under 
A'; if A is reduced by (311, 91), then A' is reduced by (3R°, 91°). 
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proof. We shall prove only the first statement; the second one clearly 
follows from it. We first observe the following identity, valid for any 
three linear transformations E, F, and A, subject to the relation F = 1 — E: 

(1) FAF -FA = EAE - AE. 

(Compare this with the proof of § 43, Theorem 2.) Let E be any projection 
on 9ft; by § 43, Theorem 1, the right member of (1) vanishes, and, therefore, 
so does the left member. By taking adjoints, we obtain F'A'F' = A'F f ; 
since, by Theorem 1 of the present section, F f — 1 — E f is a projection on 
2HX°, the proof of Theorem 2 is complete. (Here is an alternative proof of 
the first statement of Theorem 2, a proof that does not make use of the 
fact that V is the direct sum of 911 and some other subspace. If y is in 
9E°, then [x, A'y] = [Ax, y] = 0 for all x in 9R, and therefore A'y is in 9Tl°. 
The only advantage of the algebraic proof given above over this simple 
geometric proof is that the former prepares the ground for future work 
with projections.) 

We conclude our treatment of adjoints by discussing their matrices; 
this discussion is intended to illuminate the entire theory and to enable 
the reader to construct many examples. 

We shall need the following fact: if 9C = [x\, • • -, x n ) is any basis in the 
n-dimensional vector space *0, if 9C' = [yu • * •> Vn\ is the dual basis in 
t)', and if the matrix of the linear transformation A in the coordinate 
system 9C is (a#), then 

(2) a, v = [Ax h yi). 

This follows from the definition of the matrix of a linear transformation; 
since Axj — akjXk, we have 

[Axj, y t ] = X)* cticj[xh, y { ] = ctij. 

To keep things straight in the applications, we rephrase formula (2) 
verbally, thus: to find the (i, j) element of [A] in the basis 9C, apply A to 
the j-th element of 9C and then take the value of the i-th linear functional 
(in 9C') at the vector so obtained. 

It is now very easy to find the matrix = [A'] in the coordinate 
system 9C'; we merely follow the recipe just given. In other words, we 
consider A'yj, and take the value of the i-th linear functional in 9C" (that 
is, of Xi considered as a linear functional on 9C') at this vector; the result is 
that 

«'*/ = fo, A'y i 1. 

Since [x^ A%] = [Ax i} y 3 ] = aj it so that a'a = a/* this matrix [A'] is 
called the transpose of [A], 



82 


TRANSFORMATIONS 


Sec. 46 


Observe that our results on the relation between E and E' (where E 
is a projection) could also have been derived by using the facts about the 
matricial representation of a projection together with the present result 
on the matrices of adjoint transformations. 

§ 46. Change of basis 

Although what we have been doing with linear transformations so far 
may have been complicated, it was to a large extent automatic. Having 
introduced the new concept of linear transformation, we merely let some 
of the preceding concepts suggest ways in which they are connected with 
linear transformations. We now begin the proper study of linear trans- 
formations. As a first application of the theory we shall solve the problems 
arising from a change of basis. These problems can be formulated without 
mentioning linear transformations, but their solution is most effectively 
given in terms of linear transformations. 

Let V be an n-dimensional vector space and let 9 C = {xi, • • *, x n } and 
<y = {2/i, •••,»«} be two bases in *0. We may ask the following two ques- 
tions. 

Question I. If x is in V , x — &x» — 23* myi , what is the relation 

between its coordinates (£i, • • ■ , £ n ) with respect to 9 C and its coordinates 
(171, •••, ijn) with respect to <y? 

Question II. If (fi, * * *, £ n ) is an ordered set of n scalars, what is the 
relation between the vectors x = 23 * an ^ V ~ 53 * £*2/*? 

Both these questions are easily answered in the language of linear 
transformations. We consider, namely, the linear transformation A defined 
by Axi = yi , i = 1 , • • n. More explicitly: 

A (£<!<*«) = 1 2i toi- 
let ( otij ) be the matrix of A in the basis 9C, that is, y, = Ax, = «»/*•'• 

We observe that A is invertible, since tiyi = 0 implies that = £j 

= •••={» = 0 . 

answer to question i. Since 

Si VjVj ” 53i vjAxj = Vj 53* a ij x i 

- 53*’ (Si a v Vj) x if 

we have 

( 1 ) ?. = Hi <*H Vi- 

ANSWER TO QUESTION II. 

( 2 ) 


y =» Ax. 
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Roughly speaking, the invertible linear transformation A (or, more 
properly, the matrix ( a *y)) may be considered as a transformation of 
coordinates (as in (1)), or it may be considered (as we usually consider it, 
in (2)) as a transformation of vectors. 

In classical treatises on vector spaces it is customary to treat vectors 
as numerical n-tuples, rather than as abstract entities; this necessitates 
the introduction of some cumbersome terminology. We give here a brief 
glossary of some of the more baffling terms and notations that arise in con- 
nection with dual spaces and adjoint transformations. 

If V is an n-dimensional vector space, a vector x is given by its co- 
ordinates with respect to some preferred, absolute coordinate system; 
these coordinates form an ordered set of n scalars. It is customary to 
write this set of scalars in a column, 


riii 


U»J 


Elements of the dual space V r are written as rows, x f = (f'j, • • -, 

If we think of x as a (rectangular) n-by-one matrix, and of a:' as a one-by-n 
matrix, then the matrix product x f x is a one-by-one matrix, that is, a 
scalar. In our notation this scalar is [x, x f ] = £if'i + • • • + The trick 

of considering vectors as thin matrices works even when we consider the 
full-grown matrices of linear transformations. Thus the matrix product of 
(aij) with the column (£,-) is the column whose i-th element is in = £y o^yfy. 
Instead of worrying about dual bases and adjoint transformations, we 
may form similarly the product of the row (£'y) with the matrix (a,*y) in 
the order (f'y) (a t y) ; the result is the row that we earlier denoted by y' = A f x\ 
The expression [Ax, x'] is now abbreviated as x'-A-x; both dots denote 
ordinary matrix multiplication. The vectors x in V are called covariant and 
the vectors x' in T)' are called contravariant. Since the notion of the product 
x'-x (that is, [x, x']) depends, from this point of view, on the coordinates of 
x and x', it becomes relevant to ask the following question: if we change 
basis in *0, in accordance with the invertible linear transformation A, what 
must we do in V' to preserve the product x'-x? In our notation: if [x, x'] 
= [y f y% where y — Ax, then how is y ' related to x'? Answer: y ' 
= (A')“V. To express this whole tangle of ideas the classical terminology 
says that the vectors x vary cogrediently whereas the x' vary contragrediently. 
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§ 47. Similarity 

The following two questions axe closely related to those of the preceding 
section. 

Question III. If B is a linear transformation on V, what is the relation 
between its matrix (fin) with respect to SC and its matrix (7,7) with respect 
to' y? 

Question IV. If (fin) is a matrix, what is the relation between the linear 
transformations B and C defined, respectively, by Bxj = S* Pn x i and 
Cy, = 'EifinVi? 

Questions III and IV are explicit formulations of a problem we raised 
before: to one transformation there correspond (in different coordinate 
systems) many matrices (question III) and to one matrix there correspond 
many transformations (question IV). 
answer to question in. We have 

(1) Bxj = S< finXi 
and 

(2) Byj = Si yiiVi- 

Using the linear transformation A defined in the preceding section, we 
may write 

(3) Byj = BAxj - B(S» <*1 i x k) 

— 'y '. k CtkjBXt = Si a kj S filial ~ S ( ^ ■'i fiik&kj)%i) 

and 

(4) S * ykm = St ykjAxk = Si ykj «a*i 

= ^^i ( «it7li)^i- 

Comparing (2), (3), and (4), we see that 

Si a ikrtkj = Si fiik a kj- 

Using matrix multiplication, we write this in the dangerously simple form 

(5) [A][C] = (B][A). 

The danger lies in the fact that three of the four matrices in (5) correspond 
to their linear transformations in the basis 9C; the fourth one — namely, 
the one we denoted by [C ] — corresponds to B in the basis *y. With this 
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understanding, however, (5) is correct. A more usual form of (5), adapted, 
in principle, to computing [C] when [A] and [B] are known, is 

(6) [C] = [A]- l [B][Al 

answer to question iy. To bring out the essentially geometric character 
of this question and its answer, we observe that 

Cyj = CAxj 

and 

HEP* fe-y* = ftijAxi = A ( yi i — ABxj. 

Hence C is such that 

CAxj = ABxj y 

or, finally, 

(7) C - ABA" 1 . 

There is no trouble with (7) similar to the one that caused us to make a 
reservation about the interpretation of (6) ; to find the linear transformation 
(not matrix) C, we multiply the transformations A, B , and A -1 , and noth- 
ing needs to be said about coordinate systems. Compare, however, the 
formulas (6) and (7), and observe once more the innate perversity of 
mathematical symbols. This is merely another aspect of the facts already 
noted in §§ 37 and 38. 

Two matrices [B] and [C] are called similar if there exists an invertible 
matrix [A] satisfying (6) ; two linear transformations B and C are called 
similar if there exists an invertible transformation A satisfying (7). In 
this language the answers to questions III and IV can be expressed very 
briefly; in both cases the answer is that the given matrices or transforma- 
tions must be similar. 

Having obtained the answer to question IV, we see now that there are 
too many subscripts in its formulation. The validity of (7) is a geometric 
fact quite independent of linearity, finite-dimensionality, or any other 
accidental property that A , B f and C may possess; the answer to question 
IV is also the answer to a much more general question. This geometric 
question, a paraphrase of the analytic formulation of question IV, is this: 
If B transforms and if C transforms A*U the same way, what is the 
relation between B and C? The expression “the same way” is not so vague 
as it sounds; it means that if B takes x into, say, u, then C takes Ax into 
Au, The answer is, of course, the same as before: since Bx — u and 
Cy = v (where y » Ax and v = Au), we have 


ABx « Au « v =» Cy «■ CAx. 



86 


TRANSFORMATIONS 


Sec. 47 


The situation is conveniently summed up in the following mnemonic 
diagram: R 



We may go from y to v by using the short cut C, or by going around 
the block; in other words C = ABA _1 . Remember that ABA -1 is to 
be applied to y from right to left: first A -1 , then B, then A. 

We have seen that the theory of changing bases is coextensive with the 
theory of invertible linear transformations. An invertible linear trans- 
formation is an automorphism , where by an automorphism we mean an 
isomorphism of a vector space with itself. (See § 9.) We observe that, 
conversely, every automorphism is an invertible linear transformation. 

We hope that the relation between linear transformations and matrices 
is by now sufficiently clear that the reader will not object if in the sequel, 
when we wish to give examples of linear transformations with various 
properties, we content ourselves with writing down a matrix. The in- 
terpretation always to be placed on this procedure is that we have in mind 
the concrete vector space e n (or one of its generalized versions T”) and the 
concrete basis 9C = {x u • ••, x n ] defined by x,- = (5,i, •••, S in ). With 
this understanding, a matrix (ay) defines, of course, a unique linear trans- 
formation A, given by the usual formula A (£»• £.*•) = 5Z* (Hi «»;&)*»• 


EXERCISES 

1. If A is a linear transformation from a vector space 'll to a vector space V, 
then corresponding to each fixed y in *0' there exists a vector, which might as well 
be denoted by A'y, in 'll' so that 

[Ax, y ] = [x, A'y] 

for all x in *11. Prove that A' is a linear transformation from V to 'll'. (The trans- 
formation A' is called the adjoint of A.) Interpret and prove as many as possible 
among the equations § 44, (2)-(8) for this concept of adjoint. 

2. (a) Prove that similarity of linear transformations on a vector space is an 
equivalence relation (that is, it is reflexive, symmetric, and transitive). 

(b) If A is similar to a scalar a, then A = a. 

(c) If A and B are similar, then so also are A* and B 2 , A' and B', and, in case 

A and B are invertible, A -1 and B~ l . .. 

(d) Generalize the concept of similarity to two transformations defined on dif- 
ferent vector spaces. Which of the preceding results remain valid for the gener- 
alized concept? 
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3. (a) If A and B are linear transformations on the same vector space and if at 
least one of them is invertible, then AB and BA are similar. 

(b) Does the conclusion of (a) remain valid if neither A nor B is invertible? 

4. If the matrix of a linear transformation A on C 2 , with respect to the basis 

{(1, 0), (0> 1)} is what is the matrix of A with respect to the basis {(1, 1) 

(1, —1)}? What about the basis {(1, 0), (1, 1))? 

5. If the matrix of a linear transformation A on 6 3 , with respect to the basis 

/ 0 1 !\ 

{ (1, 0, 0), (0, 1, 0), (0, 0, 1) } is I 1 0 — 1 ) , what is the matrix of A with re- 

\-l -1 0/ 

spect to the basis {(0, 1, —1), (1, —1, 1), (—1, 1, 0)}? 

6. (a) The construction of a matrix associated with a linear transformation 
depends on two bases, not one. Indeed, if 9C = [x\, - * •, x n ] and 9C « {x h • • •, z»} 
are bases of *U, and if A is a Jinear transformation on V, then the matrix [A ; 9C, 9C] 
of A with respect to SC and 9C should be defined by 

AXj — OCijXi, 

The definition adopted in the text corresponds to the special case in which SC = SC. 
The special case leads to the definition of similarity ( B and C are similar if there 
exist bases SC and <y such that [B; SC] = [C; *y]). The analogous relation suggested 
by the general case is called equivalence; 2^ and C are equivalent if there exist basis 
pairs (SC, SC) and Oy, S)) such that [ B ; SC, SC] = [C; *y, *y]. Prove that this notion 
is indeed an equivalence relation. 

(b) Two linear transformations B and C are equivalent if and only if there exist 
invertible linear transformations P and Q such that PB ~ CQ. 

(c) If A and B are equivalent, then so also are A' and B'. 

(d) Does there exist a linear transformation A such that A is equivalent to a 
scalar a, but A a? 

(e) Do there exist linear transformations A and B such that A and B are equiva- 
lent, but A 2 and B 2 are not? 

(f) Generalize the concept of equivalence to two transformations defined on 
different vector spaces. Which of the preceding results remain valid for the gener- 
alized concept? 


§ 48. Quotient transformations 

Suppose that A is a linear transformation on a vector space V and that 
9TC is a subspace of *0 invariant under A. Under these circumstances there 
is a natural way of defining a linear transformation (to be denoted by 
A/9TI) on the space *U/9H; this “quotient transformation” is related to A 
just about the same way as the quotient space is related to V . It will 
be convenient (in this section) to denote U/3TI by the more compact 
symbol V~, and to use related symbols for the vectors and the linear 
transformations that occur. Thus, for instance, if x is any vector in V, we 
shall denote the coset x + SHI by x~; objects such as x~~ are the typical 
elements of 
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To define the quotient transformation A/D Tl (to be denoted, alternatively, 


by A ), write 


A x = (Ax) 


for every vector x in *0. In other words, to find the transform by A /DU 
of the coset x + DU, first find the transform by A of the vector x, and then 
form the coset of DU determined by that transformed vector. This defini- 
tion must be supported by an unambiguity argument; we must be sure 
that if two vectors determine the same coset, then the same is true of their 
transforms by A. The key fact here is the invariance of DU. Indeed, if 
x + DU = i/ + DU, then x - y is in DU, so that (invariance) Ax - Ay 
is in DU, and therefore Ax A- ^ ^ Ay + 

What happens if DU is not merely invariant under A, but, together with 
a suitable subspace 31, reduces A? If this happens, then A is the direct 
sum, say A = B ® C, of two linear transformations defined on the sub- 
spaces DU and Dl of *0, respectively; the question is, what is the relation 
between A“ and Cl Both these transformations can be considered as 
complementary to A; the transformation B describes what A does on DU, 
and both A” and C describe in different w r ays what A does elsewhere. 

Let T be the correspondence that assigns to each vector x in Dl the coset 
(= x + DU). We know already that T is an isomorphism between 
Dl and T>/DU (cf. § 22, Theorem 1) ; we shall show now that the isomorphism 
carries the transformation C over to the transformation A””. If Cx = y 
(where, of course, x is in Dl), then A~x“ = (Ax)” = (Cx)“ = y~~\ it 
follows that TCx - Ty - A~Tx. This implies that TC = AT, as 
promised. Loosely speaking (see § 47) we may say that A” transforms 
U” the same way as C transforms Dl. In other words, the linear transforma- 
tions A” and C are abstractly identical (isomorphic). This fact is of great 
significance in the applications of the concept of quotient space. 


§ 49. Range and null-space 

Definition. If A is a linear transformation on a vector space *0 and if 
DU is a subspace of T), the image of DU under A, in symbols A DU, is the 
set of all vectors of the form Ax with x in DU. The range of A is the set 
(R(A) — A*0; the nutt-space of A is the set Dl(A) of all vectors x for 
which Ax = 0. 

It is immediately verified that A DU and 31(A) are subspaces. If, as 
usual, we denote by 0 the subspace containing the vector 0 only, it is easy 
to describe some familiar concepts in terms of the terminology just in- 
troduced; we list some of the results. 
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(i) The transformation A is invertible if and only if (R(A) = V and 
91(A) * 0. 

(ii) In case V is finite-dimensional, A is invertible if and only if (ft (A) 
= V or 91(A) = 0. 

(iii) The subspace 9E is invariant under A if and only if A9TC c 9JZ. 

(iv) A pair of complementary subspaces 9fl and 91 reduce A if and only 
if A9E C 9TC and A31 C 91. 

(v) If E is the projection on 911 along 91, then (ft(Z?) = 9ft and 91 (E) — 91. 
All these statements are easy to prove; we indicate the proof of (v). 

From § 41, Theorem 2, we know that 91 is the set of all solutions of the 
equation Ex = 0; this coincides with our definition of 31(E). We know 
also that 9E is the set of all solutions of the equation Ex = x. If x is in 
9ft, then x is also in (R(E), since x is the image under E of something (namely 
of x itself). Conversely, if a vector x is the image under E of something, 
say, x - Ey (so that a; is in (ft(Z?)), then Ex = E 2 x = Ey - x, so that 
x is in 9ft. 

Warning: it is accidental that for projections (R © 91 = 1). In general 
it need not even be true that (ft = (ft(A) and 91 = 91(A) are disjoint. It 
can happen, for example, that for a certain vector x we have x ^ 0, 
Ax 9^ 0, and A 2 x — 0; for such a vector, Ax clearly belongs to both the 
range and the null-space of A. 

Theorem. If A is a linear transformation on a vector space *0, then 

(1) «R(A))° = 9l(A'); 

if V is finite-dimensional, then 

(2) (91(A)) 0 = <R(A0. 

proof. If y is in (51(A)) 0 , then, for all x in V, 

0 = [Ax, y ] = [x, A’y], 

so that A'y = 0 and y is in 9l(A'). If, on the other hand, y is in 31 (A'), 
then, for all a; in X), 

0 = [x, A'y\ = [Ax, y], 

so that y is in (<Jt(A))°. 

If we apply (1) to A' in place of A, we obtain 

(3) (<R(A'))° = 9l(A"). 

If 13 is finite-dimensional (and hence reflexive), we may replace A" by A 
in (3), and then we may form the annihilator of both sides; the desired 
conclusion (2) follows from § 17, Theorem 2. 
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EXERCISES 

1. Use the differentiation operator on (P« to show that the range and the null- 
space of a linear transformation need not be disjoint. 

2. (a) Give an example of a linear transformation on a three-dimensional space 
with a two-dimensional range. 

(b) Give an example of a linear transformation on a three-dimensional space 
with a two-dimensional null-space. 

3. Find a four-by-four matrix whose range is spanned by (1, 0, 1, 0) and (0, 1, 0, 1). 

4. (a) Two projections E and F have the same range if and only if EF = F and 
FE — E. 

(b) Two projections E and F have the same null-space if and only if EF = E 
and FE = F . 

5. If Ei, • * *, Ek are projections with the same range and if «i, • • *, otk are scalars 
such that a* = 1, then ]£»• a t Ei is a projection. 


§ 50. Rank and nullity 

We shall now restrict attention to the finite-dimensional case and draw 
certain easy conclusions from the theorem of the preceding section. 

Definition. The rank , p(A), of a linear transformation A on a finite- 
dimensional vector space is the dimension of 61(A); the nullity , v(A), 
is the dimension of 91(A). 

Theorem 1. If A is a linear transformation on an n-dimensional vector 
space, then p(A) = p(A f )andv(A) = n — p(A). 

proof. The theorem of the preceding section and § 17, Theorem 1, to- 
gether imply that 

(1) v{A') = n ~ p(A). 

Let 9C — {xi, • • •, x n ) be any basis for which xi, • • ■, x r are in 31(A); 
then, for any x = fcx», we have 

Ax = 

In other words, Ax is a linear combination of the n — v vectors Ax r +i, 
• • •, Ax n ; it follows that p(A) ^ n — v(A). Applying this result to A' 
and using (1), we obtain 

p(A') ^ n - v(A') - p(A). 


( 2 ) 
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In (2) we may replace A by A', obtaining 

(3) p(A) = p(A") ^ p(A'); 

(2) and (3) together show that 

(4) p(A) = p(A'), 

and (1) and (4) together show that 

(5) v(A') - n - p(A'). 

Replacing A by A' in (5) gives, finally, 

(6) f(A) = n - p(A), 

and concludes the proof of the theorem. 

These results are usually discussed from a little different point of view. 
Let A be a linear transformation on an n-dimensional vector space, and 
let 9C = {xi, • • •, x n \ be a basis in that space; let [A] = (a,-,) be the matrix 
of A in the coordinate system 9C, so that 

AXj = ) ] t ‘ OtjjXi. 

Since if x — tijXj, then Ax — %jAxj, it follows that every vector 
in (R(A) is a linear combination of the Ax,, and hence of any maxim al 
linearly independent subset of the Ax,. It follows that the maximal num- 
ber of linearly independent Axj is precisely p(A). In terms of the co- 
ordinates («iy, • • •, a n j) of Axj we may express this by saying that p(A) 
is the maximal number of linearly independent columns of the matrix 
[A]. Since (§ 45) the columns of [A'] (the matrix being expressed in terms 
of the dual basis of 9C) are the rows of [A], it follows from Theorem 1 that 
p(A) is also the maximal number of linearly independent rows of [A], 
Hence “the row rank of [A] = the column rank of [A] = the rank of [A].” 

Theorkm 2. If A is a linear transformation on the n-dimensional vector 
space V, and if X is any h-dimensional subspace of V, then the dimension 
of Axis 2; h — v(A). 

proof. Let X be any subspace for which V = X © X, so that if k is 
the dimension of X, then k = n — h. Upon operating with A we obtain 

AV = AX + Ax. 

(The sum is not necessarily a direct sum; see § 11.) Since An = <R(A) 
has dimension n — v(A), since the dimension of Ax is clearly g k = n - h, 
and since the dimension of the sum is g the sum of the dimensions, we have 
the desired result. 
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Theorem 3. If A and B are linear transformations on a finite-dimensional 
vector space, then 


(7) 

p(A + B) g p(A) + p(B), 

(8) 

p(AB) g min (p(A), p(B )} 

and 


(9) 

v(AB) g *»(A) + v(B). 

If B is invertible , then 

(10) 

P(AB ) - p(BA) = p(A). 


proof. Since (AE)x = A(Bx), it follows that (R(A5) is contained in 
(R(A), so that p(AB) ^ p(A), or, in other words, the rank of a product is 
not greater than the rank of the first factor. Let us apply this auxiliary 
result to B'A'; this, together with what we already know, yields (8). If 
B is invertible, then 

P (A) = p{AB-B~ x ) g p(AB) 

and 

p{A) = p{B~ x -BA) g p{BA ); 

together with (8) tills yields (10). The equation (7) is an immediate conse- 
quence of an argument we have already used in the proof of Theorem 2. 
The proof of (9) we leave as an exercise for the reader. (Hint: apply 
Theorem 2 with 3C = BV = <R(B).) Together the two formulas (8) and 
(9) are known as Sylvester’s law of nullity. 

§ 51. Transformations of rank one 

We conclude our discussion of rank by a description of the matrices of 
linear transformations of rank ^ 1. 

Theorem 1. If a linear transformation A on a finite-dimensional vector 
space V is such that p(A) S 1 (that is, p(A) = 0 or p(A) = 1), then the 
elements of the matrix [A] = ( or# ) of A have the form a,j = fiijj in every 
coordinate system; conversely if the matrix of A has this form in some one 
coordinate system, then p(A) ^ 1. 

proof. If p(A) = 0, then A = 0, and the statement is trivial. If 
p(A) - 1, that is, <R(A) is one-dimensional, then there exists in <R(A) a 
non-zero vector x 0 (a basis in <R(A)) such that every vector in <R(A) is a 
multiple of x<>. Hence, for every x, 

Ax = 2 / 0 * 0 , 
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where the scalar coefficient yo(— yo(x)) depends, of course, on x. The 
linearity of A implies that y 0 is a linear functional on *0. Let SC = [xi, 
•••,*»} be a basis in *U, and let (a,-,) be the corresponding matrix of A, 
so that 

Axj = ^ ctijXi. 

If SC' = {yi, is the dual basis in U', then (cf. § 45, (2)) 

<*ij = [Ax it yd- 

In the present case 

= [yo(z/)zo, yd = yo(z;)fro, yd = [x 0 , yi][xj, tfol; 

in other words, we may take ft = [x 0 , yd and yj = [xj, y 0 ]. 

Conversely, suppose that in a fixed coordinate system SC = {xi, • • -, x n \ 
the matrix (ay) of A is such that a,-,- = fty y. We may find a linear func- 
tional yo such that yj = [xj, j/oh and we may define a vector x 0 by x 0 
= St ftt^t- The linear transformation A defined by Ax = yo{x)x 0 is 
clearly of rank one (unless, of course, ay = 0 for all i and j), and its matrix 
(ay) in the coordinate system 9C is given by 

ay = [lx h 2/i] 

(where 9C' = [yi, • • •, y n } is the dual basis of 9C). Hence 
“ [*/o(z;)*o> Vi\ - fro> Vi][xjy l/ol = 

and, since A and A have the same matrix in one coordinate system, it 
follows that A = A. This concludes the proof of the theorem. 

The following theorem sometimes makes it possible to apply Theorem 
1 to obtain results about an arbitrary linear transformation. 

Theorem 2. If A is a linear transformation of rank p on a finite-di- 
mensional vector space V, then A may be written as the sum of p transforma- 
tions of rank one . 

proof. Since AV = CR(^i) has dimension p, we may find p vectors 
%i) * * *> that form a basis for <H(A). It follows that, for every vector 
x in *0, we have 

Ax = XI»«*i 

where each & depends, of course, on x; we write £» = y%(x). It is easy to 
see that yi is a linear functional. In terms of these yi we define, for each 
* ** 1> • * * > Py a linear transformation A { by A { x = yi(x)xi. It follows 
that each Ai has rank one and A = A+. (Compare this result with 

§ 32, example (2).) 

A slight refinement of the proof just given yields the following result. 
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Theorem 3. Corresponding to any linear transformation A on a finite- 
dimensional vector space D there is an invertible linear transformation P 
for which PA is a projection. 

proof. Let (R and 91, respectively, be the range and the null-space of 
A, and let {xi, • • • , x„ J be a basis for (R. Let x p+ \, • • • , x„ be vectors such 
that { xi , • • x„} is a basis for V. Since Xj is in (R for i = 1, • • •, p, we may 

find vectors y, such that Ay { = x,-; finally, we choose a basis for 91, which 
we may denote by {y p+u • • •, y n \ . We assert that \y u • • •, y n \ is a basis 
for U. We need, of course, to prove only that the y’e are linearly in- 
dependent. For this purpose we suppose that = 0; then we 

have (remembering that for t = p + 1, • • ■ , n the vector j/,- belongs to 91) 

-A(£r-i *iVi) = 2<-i «.*.• = 0, 

whence «,=•••=«,= 0. Consequently «#*' = 0; the linear 

independence of 1 , • • • , y n shows that the remaining as must also vanish. 

A linear transformation F, of the kind whose existence we asserted, is 
now determined by the conditions Pxi = i/j, i = 1, •••, n. Indeed, if 
i = 1, • • •, p, then PAyi = Fx» = y if and if i = p + 1, • • •, n, then PAy { 
= F0 = 0. 

Consideration of the adjoint of A, together with the reflexivity of V, 
shows that we may also find an invertible Q for which AQ is a projection. 
In case A itself is invertible, we must have F = Q — A” 1 . 


EXERCISES 

1. What is the rank of the differentiation operator on (P n ? What is its nullity? 


2. Find the ranks of the following matrices. 


/I 

1 

1\ 


/° 

0 

1\ 

(a) 1 

1 

O’ 

(c) 

(° 

1 

°) 

\l 

1 

l/ 


\1 

0 

0/ 

yi 

1 

i\ 


/° 

1 

°\ 

(b) (1 

1 

°Y 

(d) 

f 1 

0 

o 

\l 

0 

0/ 


Vo 

1 

0/ 


3. If A is left multiplication by P on a space of linear transformations (cf. 
§ 38, Ex. 5), and if F has rank m, what is the rank of A? 

4. The rank of the direct sum of two linear transformations (on finite-dimensional 
vector spaces) is the sum of their ranks. 

5. (a) If A and B are linear transformations on an n-dimensional vector space, 
and if AB = 0, then p(A) + p(B ) £ n. 
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(b) For each linear transformation A on an n-dimensional vector space there 
exists a linear transformation B such that AB = 0 and such that p(A) + p(B) = n. 

6. If Ay B, and C are linear transformations on a finite-dimensional vector space, 
then 

p(AB) + p{BC) ^ p{B) + p(ABQ. 

7. Prove that two linear transformations (on the same finite-dimensional vector 
space) are equivalent if and only if they have the same rank. 

8. (a) Suppose that A and B are linear transformations (on the same finite- 
dimensional vector space) such that A 2 = A and B 2 = B . Is it true that A and 
B are similar if and only if p(A) = p(B )? 

(b) Suppose that A and B are linear transformations (on the same finite-di- 
mensional vector space) such that A ^ 0, B ^ 0, and A 2 = B 2 = 0. Is it true 
that A and B are similar if and only if p(A) = p(B)? 

9. (a) If A is a linear transformation of rank one, then there exists a unique 
scalar a such that A 2 = aA. 

(b) If a 1, then 1 — A is invertible. 


§ 52. Tensor products of transformations 

Let us now tie up linear transformations with the theory of tensor 
products. Let 'll and V be finite-dimensional vector spaces (over the same 
field), and let A and B be any two linear transformations on 'll and V 
respectively. We define a linear transformation G on the space W of all 
bilinear forms on 'll ® V by writing 

(Gw)(x y y) = w(Ax , By). 

The tensor product C = A ® B of the transformations A and B is, by 
definition, the dual of the transformation €, so that 

(Gz) (w) = z(Cw) 

whenever z is in 'll <8> V and w is in W. If we apply C to an element zo 
of the form z 0 = x 0 ® yo (recall that this means that zo(w) = w(#o, 2/o) 
for all w in *W), we obtain 

(Cz 0 )(tt>) = Zo(£w) = (x 0 ® Vo) (Gw) 

= (Cw)(x o, yo) = w(Ax o, By 0 ) = (Ax 0 <g> By 0 )(w). 

We infer that 

(1) Cz Q — Ax 0 ® By 0 . 

Since there are quite a few elements in ni ® V of the form x ® y, enough 
at any rate to form a basis (see § 25), this relation characterizes C . 
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The formal rules for operating with tensor products go as follows. 


(2) 

A ® 0 

= 0 ® B = 0, 

(3) 

1 ® 1 

= 1, 

(4) 

(Ai + A 2 ) ® B 

= (Ai ® B) + ( A 2 ® B), 

(5) 

A ® (Bi + B 2 ) 

= (A ® B{) + (A ® B 2 ), 

(6) 

aA ® fiB 

= a/3 (A ® B), 

(7) 

(A ® J5)” 1 

= A" 1 ® Br\ 

(8) 

(A \A 2 ) 0 (BiB 2 ) 

= (Ai ® Bi)(A 2 ® B 2 ). 


The proofs of all these relations, except perhaps the last two, arc straight- 
forward. 

Formula (7), as all formulas involving inverses, has to be read with 
caution. It is intended to mean that if both A and B are invertible, then 
so is A ® B, and the equation holds, and, conversely, that if A ® B is 
invertible, then so also are A and B. We shall prove (7) and (8) in reverse 
order. 

Formula (8) follows from the characterization (1) of tensor products and 
the following computation : 

(. AjA 2 ® B 1 B 2 )(x ® y) = A x A 2 x ® B x B 2 y 

= (A x ® B x )(A 2 x ® B 2 y) = (4i ® B X )(A 2 ® B 2 )(x ® y). 

As an immediate consequence of (8) we obtain 

(9) A ® B = (A ® 1)(1 ® B) = (1 ® B)(A ® 1). 

To prove (7), suppose that A and B are invertible, and form A ® B 
and A -1 ® B~ x . Since, by (8), the product of these two transformations, 
in either order, is 1, it follows that A ® B is invertible and that (7) holds. 
Conversely, suppose that A ® B is invertible. Remembering that we 
defined tensor products for finite-dimensional spaces only, we may invoke 
§ 3G, Theorem 2; it is sufficient to prove that Ax = 0 implies that x = 0 
and By = 0 implies that y = 0. We use (1): 

Ax ® By = (A ® B)(x ® y). 

If either factor on the left is zero, then (A ® B)(x ® y) = 0, whence 
x ® y = 0, so that either x = 0 or y = 0. Since (by (2)) B = 0 is impos- 
sible, we may find a vector y so that By p* 0. Applying the above argu- 
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ment to this y, with any x for which Ax = 0, we conclude that x = 0. The 
same argument with the roles of A and B interchanged proves that B is 
invertible. 

An interesting (and complicated) side of the theory of tensor products 
of transformations is the theory of Kronecker products of matrices. Let 
a = \ x u • * ■> z») and = {yi, • • •, y m \ be bases in 01 and V, and let 
[A] = [A; 9C] = (aif) and [5] — [B; ^y] = (fi pq ) be the matrices of A and B . 
What is the matrix of A ® B in the coordinate system {x{ ® y p }'? 

To answer the question, we must recall the discussion in § 37 concerning 
the arrangement of a basis in a linear order. Since, unfortunately, it is 
impossible to write down a matrix without being committed to an order of 
the rows and the columns, we shall be frank about it, and arrange the n 
times m vectors x* 0 y p in the so-called lexicographical order, as follows: 

x i ® Vu x i ® 2 / 2 , • * •, Xx ® y m , x 2 ® yi, * * •, 

X 2 ®y m ,-- •, x n ® y u • • •, x n ® y m . 


We proceed also to carry out the following computation: 

(A ® B){xj ® y = Axj ® By q — ( ^ ctgxA ® C^,pfip q yp) 

= GijPpq(%i ® 2/p)» 


This process indicates exactly how far we can get without ordering the 
basis elements; if, for example, we agree to index the elements of a matrix 
not with a pair of integers but with a pair of pairs, say (i, p) and 0*, q), 
then we know now that the element in the (i, p) row and the (j, q ) column 
is aij@ pq . If we use the lexicographical ordering, the matrix of A ® B has 
the form 

f <*11011 • * * «1101m * # • <*ln011 • # * OLlnPlm 1 


<*110ml ' * • <*110mm ’ ' * 


a niPn * • • «ni0im • . • a nn pn • • • a nn 0 lm 


LOtnlPml * • * a n i(l mm ■ • • Ct nn &ml * ’ ' «nn0mm 
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In a condensed notation whose meaning is clear we may write this matrix as 

[anlSl ••• ai„[B]l 


Lo«l [B] ••• «nn[B] J 

This matrix is known as the Kronecker product of [A] and [B], in that 
order. The rule for forming it is easy to describe in words: replace each 
element of the n-by-n matrix [A] by the ra-by-m matrix o#[£]. If in 
this rule we interchange the roles of A and B (and consequently interchange 
n and m) we obtain the definition of the Kronecker product of [B] and [A]. 


EXERCISES 

1. We know that the tensor product of (P« and (P m may be identified with the 
space (P n ,m of polynomials in two variables (see § 25, Ex. 2). Prove that if A and 
B are differentiation on (P» and (P m respectively, and if C = A ® B f then C is 

dh 

mixed partial differentiation, that is, if z is in (P n ,m > then Cz = ^ ^ 

2. With the lexicographic ordering of the product basis {*, ® y v ) it turned out 
that the matrix of A <8> B is the Kronecker product of the matrices of A and B. 
Is there an arrangement of the basis vectors such that the matrix of A ® B, 
referred to the coordinate system so arranged, is the Kronecker product of the 
matrices of B and A (in that order)? 

3. If A and B are linear transformations, then 

p(A 0 5)= p(A)p(B). 


§ 53. Determinants 

It is, of course, possible to generalize the considerations of the preceding 
section to multilinear forms and multiple tensor products. Instead of 
entering into that part of multilinear algebra, we proceed in a different 
direction ; we go directly after determinants. 

Suppose that A is a linear transformation on an n-dimensional vector 
space 13 arid let w be an alternating n-linear form on *U. If we write Aw 
for the function defined by 

(Aw){x i, • * *, x n ) = w(Ax i, • • •, Ax n ), 

then Aw is an alternating n-linear form on V, and, in fact, A is a linear 
transformation on the space of such forms. Since (see § 31) that space is 
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one-dimensional, it follows that A is equal to multiplication by an ap- 
propriate scalar. In other words, there exists a scalar 8 such that Aw = 8w 
for every alternating n-linear form w. By this somewhat roundabout 
procedure (from A to A to 5) we have associated a uniquely determined 
scalar 6 with every linear transformation A on 1); we call 8 the determinant 
of Ay and we write 8 — det A. Observe that det is neither a scalar nor a 
transformation, but a function that associates a scalar with each linear 
transformation. 

Our immediate purpose is to study the function det. We begin by finding 
the determinants of the simplest linear transformations, that is, the 
multiplications by scalars. If Ax — ax for every x in then 

(Aw)(x u • • •, x„) = w(ax 1 , • • •, ax n ) = a n w(xi, •••,*„) 

for every alternating n-linear form w ; it follows that det A = a n . We note, 
in particular, that det 0 = 0 and det 1 = 1. 

Next we ask about the multiplicative properties of det. Suppose that 
A and B are linear transformations on D, and write C = AB. If w is 
an alternating n-linear form, then 

(Cw)(xi, •■■,x n ) = w(ABxi, • • •, ABx n ) 

= (Aw)(Bx h • • •, Bx„) = (BAw)(x h • • •, x n ), 

so that C = BA . Since 

Cw = (det C)w 

and 

BAw = (det B)Aw — (det B)(det A)w, 

it follows that 

det ( AB ) = (det A) (det B). 

(The values of det are scalars, and therefore commute with each other.) 

A linear transformation A is called singular if det A = 0 and non-singular 
otherwise. Our next result is that A is invertible if and only if it is non- 
singular. Indeed, if A is invertible, then 

1 = det 1 = det (AA -1 ) = (det A) (det A -1 ), 

and therefore det A^ 0. Suppose, on the other hand, that det 4 / 0. 
If i x i> ■••»*»} is a basis in p, and if w is a non-zero alternating n-linear 
form on V, then (det A)w(x u • • •, x„) ^ 0 by § 30, Theorem 3. This 
implies, by §30, Theorem 2, that the set { Axi, •••, Ax n J is linearly in- 
dependent (and therefore a basis) ; from this, in turn, we infer that A is 
invertible. 
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In the classical literature determinant is defined as a function of matrices 
(not linear transformations); we are now in a position to make contact 
with that approach. We shall derive an expression for det A in terms of 
the elements of the matrix corresponding to A in some coordinate 
system {x u • • •, x n \. Let w be a non-zero alternating n-linear form; we 
know that 

(1) (det A)w(x i, • • •, x n ) * w(Ax u • • •, Ax n ). 

If we replace each Axj in the right side of (1) by an ^ expand the 

result by multilinearity, we obtain a long linear combination of terms such 
as w(zi 9 • • •, z n ), where each z is one of the x’s. (Compare this part of the 
argument with the proof of § 30, Theorem 3.) If, in such a term, two of 
the z\ s coincide, then, since w is alternating, that term must vanish. If, 
on the other hand, all the z 9 s are distinct, then w{z u • • • , z n ) = rw(xi, • • • , x n ) 
for some permutation ir, and, moreover, every permutation w can occur in 
this way. The coefficient of the term rw(x Xl ••*, x n ) is the product 
a T (i) tl * • -a T ( n ).n- Since (§ 30, Theorem 1) w is skew symmetric, it follows 
that 

(2) det A = (sgn Tr)a r (i),i- • •<*,(»),» 

where the summation is extended over all permutations w in S n . (Recall 
that w(xx, • • •, x n ) 0, by §30, Theorem 3, so that division by w{xi, 
• • • , x n ) is legitimate.) 

From this classical equation (2) we could derive many special properties 
of determinants by straightforward computation. Here is one example. 
If <r and tt are permutations (in S n ), then (since ira is also a permutation), 
it follows that the products a,r(i),i* • •«*■(»),» and a T <f(i),<r(i ) • * ‘«xff(n) t <r(n) 
differ in the order of their factors only. If, for each 7r, we take <r — tt _ 1 , 
and then alter each summand in (2) accordingly, we obtain 

det A = 2x (sgn 7r)a ltT(1 ) • • 'at nMn ). 

(Note that sgn w « sgn t~~ 1 and that the sum over all ir is the same as 
the sum over all tt"" 1 .) Since this last sum is just like the sum in (2), except 
that a,* iT (,) appears in place of it follows from an application of 

(2) to A f in place of A that 

det A = det A'. 

Here is another useful fact about determinants. If 911 is a subspace 
invariant under A, if B is the transformation A considered on SfTC only, 
and if C is the quotient transformation ^L/9R, then 


det A = det B • det (7. 
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This multiplicative relation holds if, in particular, A is the direct sum of 
two transformations B and C. The proof can be based directly on the 
definition of determinants, or, alternatively, on the expansion obtained in 
the preceding paragraph. 

If, for a fixed linear transformation A, we write p(X) = det ( A — X), 
then p is a function of the scalar X; we assert that it is, in fact, a poly- 
nomial of degree n in X, and that the coefficient of X” is (— l) n . For the 
proof we may use the notation of (1). It is easy to see that w{(A - X)^, 
• ■ (A — \)x n ) is a sum of terms such as X fc u?(yi, • • •, y n ), where = Xi 
for exactly k values of i and - Axi for the remaining n - k values of 
i (fc « 0, 1, • • • , n). The polynomial p is called the characteristic polynomial 
of A; the equation p = 0, that is, det (A - X) = 0, is the characteristic 
equation of A. The roots of the characteristic equation of A (that is, 
the scalars a such that det (A — a) = 0) are called the characteristic roots 
of A. 


EXERCISES 

1. Use determinants to get a new proof of the fact that if A and B are linear 
transformations on a finite-dimensional vector space, and if AB = 1, then both A 
and B are invertible. 

2. If A and B are linear transformations such that AB = 0, 1^0, B 5 * 0, 
then det A = det B — 0. 

3. Suppose that (a,y) is a non-singular n-by-n matrix, and suppose that Ai, • • •, 
A n are linear transformations (on the same vector space). Prove that if the linear 
transformations a *>Ay, t = 1, ■ • • , n, commute with each other, then the same 
is true of Ai, A„. 

4. If [xi, and {yi, • • -, y n ] are bases in the same vector space, and if A 

is a linear transformation such that Ax% — y%,i — 1, •**,», then det A 5 * 0 . 

5. Suppose that {xi, • • •, x n \ is a basis in a finite-dimensional vector space *0. 

If y\ f • • y n are vectors in 1), write , y n ) for the determinant of the linear 

transformation A such that Ax, = j = 1, • • • , n. Prove that w is an. alternating 
n-linear form. 

6. If, in accordance with § 53, (2), the determinant of a matrix (a**) (not a 

linear transformation) is defined to be (sgn 7r)a*(i),i* * • «»<»),«, then, for each 

linear transformation A, the determinants of all the matrices [A; 9C] are all equal 
to each other. (Here 9C is an arbitrary basis.) 

7. If (fitij) is an n-by-n matrix such that a iS — 0 for more than n 2 — n pairs of 
values of f and j, then det (a,,) = 0. 

8. If A and B are linear transformations on vector spaces of dimensions n and 
m, respectively, then 


det (A 0 B) *= (det A)*" -(det B)\ 
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9. If A y B y C, and D are matrices such that C and D commute and D is invertible, 
then (cf. § 38, Ex. 19) 

det (c d) = det (AD ~ BC) - 

(Hint: multiply on the right by (^ ^) •) What if D is not invertible? What 
if C and D do not commute? 

10. Do A and A' always have the same characteristic polynomial? 

11. (a) If A and B are similar, then det A = det B. 

(b) If A and B are similar, then A and B have the same characteristic poly- 
nomial. 

(c) If A and B have the same characteristic polynomial, then det A = det B. 

(d) Is the converse of any of these assertions true? 

12. Determine the characteristic polynomial of the matrix (or, rather, of the 


linear transformation defined by the matrix) 



' 0 

1 

0 

o- 


0 

0 

1 

0 


0 

0 

0 

1 


-Ot n -l 

2 

OCn-Z 

ao- 


and conclude that every polynomial is the characteristic polynomial of some linear 
transformation. 

13. Suppose that A and B are linear transformations on the same finite-di- 
mensional vector space. 

(a) Prove that if A is a projection, then AB and BA have the same charac- 
teristic polynomial. (Hint: choose a basis that makes the matrix of A as simple as 
possible and then compute directly with matrices.) 

(b) Prove that, in all cases, AB and BA have the same characteristic polynomial. 
(Hint: find an invertible P such that PA is a projection and apply (a) to PA and 
BP- 1 .) 


§ 54. Proper values 

A scalar X is a proper value and a non-zero vector x is a proper vector of a 
linear transformation A if Ax = Xx. Almost every combination of the ad- 
jectives proper, latent, characteristic, eigen, and secular, with the nouns 
root, number, and value, has been used in the literature for what we call a 
proper value. It is important to be aware of the order of choice in the 
definition; X is a proper value of A if there exists a non-zero vector x for 
which Ax = Xx, and a non-zero vector x is a proper vector of A if there 
exists a scalar X for which Ax = Xx. 
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Suppose that X is a proper value of A ; let 9TI be the collection of all vectors 
x that are proper vectors of A belonging to this proper value, that is, for 
which Ax * Xx. Since, by our definition, 0 is not a proper vector, 3TC does 
not contain 0; if, however, we enlarge 3TC by adjoining the origin to it, then 
9TZ becomes a subspace. We define the multiplicity of the proper value X as 
the dimension of the subspace SflZ; a simple proper value is one whose 
multiplicity is equal to 1. By an obvious extension of this terminology, we 
may express the fact that a scalar X is not a proper value of A at all by saying 
that X is a proper value of multiplicity zero. The set of proper values of A 
is sometimes called the spectrum of A. Note that the spectrum of A is the 
same as the set of all scalars X for which A — X is not invertible. 

If the vector space we are working with has dimension n, then the scalar 
0 is a proper value of multiplicity n of the linear transformation 0, and, 
similarly, the scalar 1 is a proper value of multiplicity n of the linear trans- 
formation 1. Since Ax = Xx if and only if (A - \)x = 0, that is, if and 
only if x is in the null-space of A - X, it follows that the multiplicity of X as 
a proper value of A is the same as the nullity of the linear transformation 
A - X. From this, in turn, we infer (see § 50, Theorem 1) that the proper 
values of A, together with their associated multiplicities, are exactly the 
same as those of A\ 

We observe that if B is any invertible transformation, then 
BAB- 1 — X = B(A — X)B" 1 , 

so that (A - X)x = 0 if and only if (BAB~ l - \)Bx = 0. This implies 
that all spectral concepts (for example, the spectrum and the multiplicities 
of the proper values) are invariant under the replacement of A by BAB~ l . 
We note also that if Ax = Xx, then 

A 2 x » A (Ax) = A(Xx) = X(Ax) = X(Xx) = X 2 x. 

More generally, if p is any polynomial, then p(A)x = p(X)x, so that every 
proper vector of A, belonging to the proper value X, is also a proper vector 
of p(A), belonging to the proper value p(X). Hence if A satisfies any equa- 
tion of the form p(A) = 0, then p(X) = 0 for every proper value X of A. 

Since a necessary and sufficient condition that A — X have a non-trivial 
null-space is that it be singular, that is, that det (A — X) = 0, it follows 
that X is a proper value of A if and only if it is a characteristic root of A. 
This fact is the reason for the importance of determinants in linear algebra. 
The useful geometric concept is that of a proper value. From the geometry 
of the situation, however, it is impossible to prove that any proper values 
exist. By means of determinants we reduce the problem to an algebraic 
one; it turns out that proper values are the same as roots of a certain poly- 
nomial equation. No wonder now that it is hard to prove that proper val- 
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ues always exist: polynomial equations do not always have roots, and, cor- 
respondingly, there are easy examples of linear transformations with no 
proper values. 


§ 55. Multiplicity 

The discussion in the preceding section indicates one of our reasons for 
wanting to study complex vector spaces. By the so-called fundamental 
theorem of algebra, a polynomial equation over the field of complex num- 
bers always has at least one root; it follows that a linear transformation on 
a complex vector space always has at least one proper value. There are 
other fields, besides the field of complex numbers, over which every poly- 
nomial equation is solvable; they are called algebraically closed fields. The 
most general result of the kind we are after at the moment is that every 
linear transformation on a finite-dimensional vector space over an algebrai- 
cally closed field has at least one proper value. Throughout the rest of this 
chapter (in the next four sections) we shall assume that our field of scalars 
is algebraically closed. The use we shall make of this assumption is the 
one just mentioned, namely, that from it we may conclude that proper 
values always exist. 

The algebraic point of view on proper values suggests another possible 
definition of multiplicity. Suppose that A is a linear transformation on a 
finite-dimensional vector space, and suppose that X is a proper value of A. 
We might wish to consider the multiplicity of X as a root of the character- 
istic equation of A. This is a useful concept, which we shall call the alge- 
braic multiplicity of X, to distinguish it from our earlier, geometric , notion 
of multiplicity. 

The two concepts of multiplicity do not coincide, as the following exam- 
ple shows. If D is differentiation on the space (P n of all polynomials of 
degree — 1, then a necessary and sufficient condition that a vector x 

dx 

in (P n be a proper vector of D is that — ss \x(t) for some complex number X. 

dt 

We borrow from the elementary theory of differential equations the fact 
that every solution of this equation is a constant multiple of e Kt . Since, 
unless X = 0, only the zero multiple of e Xt is a polynomial (which it must be 
if it is to belong to (P n ), we must have X = 0 and x(t) = 1. In other words, 
this particular transformation has only one proper value (which must there- 
fore occur with algebraic multiplicity n), namely, X = 0; but, and this is 
more disturbing, the dimension of the linear manifold of solutions is exactly 
one. Hence if n > 1, the two definitions of multiplicity give different val- 
ues. (In this argument we used the simple fact that a polynomial equation 
of degree n over an algebraically closed field has exactly n roots, if multiplic- 
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ities are suitably counted. It follows that a linear transformation on an 
n-dimensional vector space over such a field has exactly n proper values, 
counting algebraic multiplicities.) 

It is quite easy to see that the geometric multiplicity of X is never greater 
than its algebraic multiplicity. Indeed, if A is any linear transformation, 
if Xo is any of its proper values, and if 3TC is the subspace of solutions of 
Ax = Xo#, then it is clear that 3TC is invariant under A. If A 0 is the linear 
transformation A considered on 9TI only, then it is clear that det (A 0 — X) 
is a factor of det (A — X). If the dimension of OT (= the geometric 
multiplicity of Xo) is m, then det (A 0 — X) — (X 0 — X) m ; the desired result 
follows from the definition of algebraic multiplicity. It follows also that 
if Xi, • • •, \ P are the distinct proper values of A, with respective geometric 
multiplicities m u * * •, m Pl and if it happens that m* - n } then mi is 
equal to the algebraic multiplicity of X* for each i = 1, • • *, p. 

By means of proper values and their algebraic multiplicities we can 
characterize two interesting functions of linear transformations; one of 
them is the determinant and the other is something new. (Warning: these 
characterizations are valid only under our current assumption that the 
scalar field is algebraically closed.) 

Let A be any linear transformation on an n-dimensional vector space, 
and let Xi, • • •, \ p be its distinct proper values. Let us denote by my the 
algebraic multiplicity of Xy, j — 1, • • • , p, so that mi + • • • + m p — n. For 
any polynomial equation 

a o + <*iX + * * * + ot„\ n = 0, 

the product of the roots is ( — l) n ao/a n and the sum of the roots is 
—a n _i/a n . Since the leading coefficient (-a n ) of the characteristic poly- 
nomial det (A — X) is ( — l) n and since the constant term ( = ao) is det 
(A — 0) — det A, we have 

det A - II?-i 

This characterization of the determinant motivates the definition 

tr A = IX 1 m A; 

the function so defined is called the trace of A. We shall have no occasion 
to use trace in the sequel; we leave the derivation of the basic properties 
of the trace to the interested reader. 
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EXERCISES 

1. Find all (complex) proper values and proper vectors of the following matrices. 



2. Let 9r be a permutation of the integers {1, • • *, n) ; if x — (£i, •••,£») is a 

vector in e n , write Ax = (£,(d, • • •, <»>)• Find the spectrum of A. 

3. Prove that all the proper values of a projection are 0 or 1 and that all the 
proper values of an involution are +1 or — 1. (This result does not depend on 
the finite-dimensionality of the vector space.) 

4. Suppose that A is a linear transformation and that p is a polynomial. We 
know that if X is a proper value of A, then p(X) is a proper value of p(A); what 
can be said about the converse? 

5. Prove that the differentiation operator D on the space (P„ (n > 1) is not 
reducible (that is, it is not reduced by any non-trivial pair of complementary 
subspaces 9H and 91). 

6. If A is a linear transformation on a finite-dimensional vector space, and if X 
is a proper value of A, then the algebraic multiplicity of X for A is equal to the 
algebraic multiplicity of X for BAB~ l . (Here B is an arbitrary invertible trans- 
formation.) 

7. Do AB and BA always have the same spectrum? 

8. Suppose that A and B are linear transformations on finite-dimensional vector 
spaces. 

(a) tr( A ® B) = trA + tr£. 

(b) tr(A ® B) = (tr A)(tr B). 

(c) The spectrum of A © B is the union of the spectra of A and B. 

(d) The spectrum of A <g> B consists of all the scalars of the form aft, with a 
and p in the spectrum of A and of B, respectively. 


§ 56. Triangular form 

It is now quite easy to prove the easiest one of the so-called canonical 
form theorems. Our assumption about the scalar field (namely, that it is 
algebraically closed) is still in force. 

Theorem 1. If A is any linear transformation on an n-dimensional vector 
space *1), then there exist n + 1 subspaces 9!lo> 9U n -ij SflL, with the 

following properties : 
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(i) each 3TCy (j = 0, 1, • • •, n — 1, n) is invariant under A, 

(ii) the dimension of 9T ly is j, 

(iii) (© =) C SfRi C • • * C 3Tt n _i C 9TC„ (=U). 

proof. If n mm 0 or n — 1, the result is trivial; we proceed by induction, 
assuming that the statement is correct for n — 1. Consider the dual trans- 
formation A ' on t)'; since it has at least one proper vector, say x', there 
exists a one-dimensional subspace 9TZ invariant under it, namely, the set 
of all multiples of x'. Let us denote by the annihilator (in U" = U) 
of 911, 3TC n _i — 2fH°; then 3TC n _i is an (n — 1) -dimensional subspace of V, 
and 9Tln-i is invariant under A. Consequently we may consider A as a 
linear transformation on 9fTC„_i alone, and we may find 3TCo, Sflli, • • •, 9TCn-2, 
3H n _x, satisfying the conditions (i), (ii), (iii). We write 3Tl n = 1), and we 
are done. 

The chief interest of this theorem comes from its matricial interpreta- 
tion. Since 3Tli is one-dimensional, we may find in it a vector x x 0. 
Since 3Ei C 3Tl 2 , it follows that Xi is also in 3TC 2 , and since 3TC 2 is two-dimen- 
sional, we may find in it a vector x 2 such that Xi and x 2 span 3TC 2 . We pro- 
ceed in this way by induction, choosing vectors xy so that Xj, • * •, xy lie in 
3TCy and span 3TCy for j = 1, • • •, n. We obtain finally a basis 9C — {xi, 
• ■ x„} in U; let us compute the matrix of A in this coordinate system. 

Since xy is in 31ty and since 3TCy is invariant under A , it follows that Axy 
must be a linear combination of X\, • • • , xy. Hence in the expression 

Axy = yi i &ijXi 

the coefficient of x,- must vanish whenever i > j; in other words, i > j 
implies a,*y = 0. Hence the matrix of A has the triangular form 


1A] 


«il 

<*12 

<*13 

• • • <*1» 

0 

a 22 

<*23 

• • • a 2n 

0 

0 

0 

• • • a»_ 1,« 

0 

0 

0 

• * * <*nn 


It is clear from this representation that det (A — an) — 0 for i = 1, •••, 
n, so that the an are the proper values of A, appearing on the main diagonal 
of [A] with the proper multiplicities. We sum up as follows. 


Theorem 2. If A is a linear transformation on an n-dimensional vector 
space 1), then there exists a basis 9C in *U such that the matrix [A ; 9C] is tri- 
angular; or, equivalently, if [A] is any matrix, there exists a non-singular 
matrix [jB] such that [£] -1 [A][£] is triangular. 
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The triangular form is useful for proving many results about linear 
transformations. It follows from it, for example, that for any polynomial 
p, the proper values of p(A), including their algebraic multiplicities, are 
precisely the numbers p(X), where X runs through the proper values of A . 

A large part of the theory of linear transformations is devoted to improv- 
ing the triangularization result just obtained. The best thing a matrix can 
be is not triangular but diagonal (that is, a» ; * = 0 unless i = j) ; if a linear 
transformation is such that its matrix with respect to a suitable coordinate 
system is diagonal we shall call the transformation diagonable . 


EXERCISES 

1. Interpret the following matrices as linear transformations on <3 2 and, in each 
case, find a basis of 6 2 such that the matrix of the transformation with respect to 
that basis is triangular. 

» a !)• 

« (I ;> 

« (! D- 

« C !)• 

2. Two commutative linear transformations on a finite-dimensional vector space 
*U over an algebraically closed field can be simultaneously triangularized. In other 
words, if AB = BA, then there exists a basis 9C such that both [A; 9C] and [5; 9C] 
are triangular. [Hint: to imitate the proof in § 56, it is desirable to find a subspace 
SHX of *0 invariant under both A and B. With this in mind, consider any proper 
value X of A and examine the set of all solutions of Az — Xx for the role of 3TCJ 

3. Formulate and prove the analogues of the results of § 56 for triangular matrices 
below the diagonal (instead of above it). 

4. Suppose that A is a linear transformation over an n-dimensional vector space. 
For every alternating n-linear form w f write Aw for the function defined by 

(Atfl)(Xi, ' • *n) - W{AX 1 , X 2 , • • *, X n ) 

+ w(xi t Ax 2 , •••,**»)+•••+ w(xi, Xj, • * *, Ax»). 

Since Aw is an alternating n-linear form, and, in fact, A is a linear transformation 
on the (one-dimensional) space of such forms, it follows that Aw — t(A) *w, where 
t(A) is a scalar. 

(a) r( 0) - 0. 

(b) t( 1) « n. 

(c) r(A + B) - r(A) + r(£). 

(d) r(atA) * ar(A). 
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(e) If the scalar field has characteristic zero and if A is a projection, then t(A) 
= P(A). 

(f) If (a a) is the matrix of A in some coordinate system, then r(A) = Yv an 

(g) t(A') = r(A). 

(h) t(AB ) = r(£A). 

(i) For which permutations 7r of the integers 1, • • •, k is it true that t(Ai* • *A*) 
= t(A X (d • • -A x (jt)) for ah fc-tuples (Ai, * • *, A*) of linear transformations? 

(j) If the field of scalars is algebraically closed, then r(A) = tr A. (For this 
reason trace is usually defined to be r; the most popular procedure is to use (0 
as the definition.) 


5. (a) Suppose that the scalar field has characteristic zero. Prove that if Ei, 
Ek and Ei H — ♦ + Ek are projections, then EJ5j = 0 whenever t t 6 j. (Hint: 

from the fact that tr(£i b E k ) = tr(£i) -\ + tr(E*) conclude that the 

range of Ei H b £* is the direct sum of the ranges of E\ } • • •, Ek .) 

(b) If Ai, * • •, A* are linear transformations on an n-dimensional vector space, 

and if Ai H h A* = 1 and p(Ai) 1- p(Ak) ^ n, then each A,- is a projection 

and AiAj =*= 0 whenever t j . (Start with k = 2 and proceed by induction; use 
a direct sum argument as in (a).) 


6. (a) If A is a linear transformation on a finite-dimensional vector space over 
a field of characteristic zero, and if tr A = 0, then there exists a basis 9C such that 
if [A ; SC] = (otij), then an = 0 for all t. (Hint: using the fact that A is not a scalar, 
prove first that there exists a vector x such that x and Ax are linearly independent. 
This proves that an can be made to vanish; proceed by induction.) 

(b) Show that if the characteristic is not zero, the conclusion of (a) s false. 


(Hint: if the characteristic is 2, compute BC — CB, where B ~ 




and C = 


§ 57. Nilpotence 


As an aid to getting a representation theorem more informative than the 
triangular one, we proceed to introduce and to study a very special but 
useful class of transformations. A linear transformation A is called nil- 
potent if there exists a strictly positive integer q such that A q = 0; the least 
such integer q is the index of nilpotence. 

Theorem 1. If A is a nilpotent linear transformation of index q on a 
finite-dimensional vector space *0, and if x 0 is a vector for which A q ~~ r x o 
0, then the vectors x 0 , Axo, * • * , A q ^ 1 x 0 are linearly independent. If 
3C is the subspace spanned by these vectors , then there exists a subspace 3Z 
such that 1) — 3C © X and such that the pair (3C, 3C) reduces A. 

proof. To prove the asserted linear independence, suppose that 
o otiA'x o = 0, and let j be the least index such that aj 5 * 0. (We do 
not exclude the possibility j = 0.) Dividing through by —a, and chang- 
ing the notation in an obvious way, we obtain a relation of the form 

A’x O - £?-/+i ctiA^o = A y+1 (£?r/+x otiA'-’-'xo) = A j+ 1 y. 
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It follows from the definition of q that 

A q ~ l x 0 = A*-*- l AfxQ = A q ~ j ~ x A j+1 y = A q y = 0; 

since this contradicts the choice of x 0i we must have atj = 0 for each j. 

It is clear that X is invariant under A ; to construct X we go by induc- 
tion on the index q of nilpotence. If q = 1, the result is trivial; we now 
assume the theorem for q — 1. The range <R of A is a subspace that is in- 
variant under A ; restricted to (R the linear transformation A is nilpotent 
of index q — 1. We write 3C 0 = 3C fl (R and yo — Ax 0 ; then JCq is spanned 
by the linearly independent vectors y 0i Ay 0 , • • •, A q ~ 2 y 0 . The induction 
hypothesis may be applied, and we may conclude that (R is the direct sum 
of 3C 0 and some other invariant subspace 3C 0 . 

We write X x for the set of all vectors x such that Ax is in 3C 0 ; it is clear 
that X x is a subspace. The temptation is great to set X = X x and to at- 
tempt to prove that X has the desired properties. Unfortunately this need 
not be true; X and X x need not be disjoint. (It is true, but we shall not 
use the fact, that the intersection of X and X x is contained in the null- 
space of A .) That, in spite of this, X x is useful is caused by the fact that 
X + X x = V. To prove this, observe that Ax is in <R for every x, and, 
consequently, Ax = y + z with y in 3Co and z in JC 0 . The general element 
of 3Co is a linear combination of Ax 0 , • • •, A q ~ x x 0 ; hence we have 

V = <*iA'x 0 = A(X?-o a,*+iA\r 0 ) = Ay Xf 

where y x is in X. It follows that Ax = Ay x + z, or A (x — y x ) = 2 , so that 
A(x - y x ) is in 3C 0 . This means that x — y x is in X ly so that x is the sum 
of an element (namely y x ) of X and an element (namely x — y x ) of X x . 

As far as disjointness is concerned, we can say at least that X 0 Xq — 0, 
To prove this, suppose that x is in X 0 3C 0 , and observe first that Ax is in 
3Co (since x is in X ). Since 3Co is also invariant under A , the vector Ax be- 
longs to JCo along with x y so that Ax — 0. From this we infer that # is in 
5C 0 . (Since x is in 3C, we have x = <*iA'x 0 ; and therefore 0 = Ax 

— ]C*-i iA'xo; from the linear independence of the A 3 x 0 it follows that 
ao = • • • = <Xq - 2 — 0, so that x — a q ^ x A q ~ l XQ.) We have proved that if 
x belongs to JC fl 3Co, then it belongs also to 3 Cq fl JCo, and hence that 
x = 0. 

The situation now is this: X and X x together span *U, and X x contains 
the two disjoint subspaces 3C 0 and X fl X x . If we let 3C' 0 be any comple- 
ment of X 0 © (X fl JCj) in X Xf that is, if 

JC'o ® JC 0 ® (X fl X x ) = JC lf 

then we may write X — 5C'o ® Xo) we assert that this X has the desired 
properties. In the first place, X C X x and X is disjoint from X Ci X x ; it 
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follows that JC fl JC » 0. In the second place, JC © JC contains both JC 
and JCi, so that JC 0 JC = 1). Finally, JC is invariant under A , since the 
fact that JC C JCi implies that A JC C JC 0 C JC. The proof of the theorem 
is complete. 

Later we shall need the following remark. If £ 0 is any other vector for 
which A 9 ~ 1 £q 0, if JC is the subspace spanned by the vectors Atf 0 , 

• • •, A q ~ l %Q, and if, finally, JC is any subspace that together with JC re- 
duces A, then the behavior of A on JC and JC is the same as its behavior on 
JC and JC respectively. (In other words, in spite of the apparent non- 
uniqueness in the statement of Theorem 1, everything is in fact uniquely 
determined up to isomorphisms.) The truth of this remark follows from 
the fact that the index of nilpotence of A on JC ( r , say) is the same as the 
index of nilpotence of A on JC (f, say). This fact, in turn, is proved as 
follows. Since A r V = A r JC + A r JC and also A r V = A r JC + A r JC (these 
results depend on the invariance of all the subspaces involved), it follows 
that the dimensions of the right sides of these equations may be equated, 
and hence that (q — r) + 0 = (q — r) + (f — r). 

Using Theorem 1 we can find a complete geometric characterization of 
nilpotent transformations. 

Theorem 2. If A is a nilpotent linear transformation of index q on a 
finite-dimensional vector space T), then there exist positive integers r, q it • • • , 
q r and vectors X\ } • • *, x r such that (i) q t ^ ^ q ry (ii) the vectors 

*i. Az h • • •, A^Xx, 

X2, Ax 2, A** -I x 2, 


X r , Ax r , •••, A q ’ X X r 

form a basis for V, and (iii) A q ^x 1 = A ff *x 2 =•••== A 9r x r = 0. The 
integers r, qi, • * • , q T form a complete set of isomorphism invariants of A. 
//, in other words , B is any other nilpotent linear transformation on a 
finite-dimensional vector space W, then a necessary and sufficient condition 
that there exist an isomorphism T between t) and such that TA T~ x — B 
is that the integers r y q ly * • • , q r attached to B be the same as the ones attached 
to A. 

proof. x We write q\ = q and we choose X\ to be any vector for which 
A fl i -1 xi p* 0. The subspace spanned by z\, Ax i, • • •, A q is invariant 
under A, and, by Theorem 1, possesses an invariant complement, which, 
naturally, has strictly lower dimension than 1). On this complementary 
subspace A is nilpotent of index q 2i say; we apply the same reduction pro- 
cedure to this subspace (beginning with a vector x 2 for which A q t~ l x 2 0). 
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We continue thus by induction till we exhaust the space. This proves the 
existential part of the theorem; the remaining part follows from the unique- 
ness (up to isomorphisms) of the decomposition given by Theorem L 

With respect to the basis {A l Xj} described in Theorem 2, the matrix of 
A takes on a particularly simple form. Every matrix element not on the 
diagonal just below the main diagonal vanishes (that is, a# ^ 0 implies 
j = i — 1), and the elements below the main diagonal begin (at top) with 
a string of l’s followed by a single 0, then go on with another string of l’s 
followed by a 0, and continue so on to the end, with the lengths of the 
strings of Ts monotonely decreasing (or, at any rate, non-increasing). 

Observe that our standing assumption about the algebraic closure of the 
field of scalars was not used in this section. 


EXERCISES 

1. Does there exist a nilpotent transformation of index 3 on a 2-dimensional 
space? 

2. (a) Prove that a nilpotent linear transformation on a finite-dimensional 
vector space has trace zero. 

(b) Prove that if A and B are linear transformations (on the same finite-di- 
mensional vector space) and if C = AB — BA, then 1 — (7 is not nilpotent. 

3. Prove that if A is a nilpotent linear transformation of index q on a finite-di- 
mensional vector space, then 

v(A k+v ) + f(A* _1 ) ^ 2v(A h ) 

for k = 1 , • • •, q — 1 . 

4. If A is a linear transformation (on a finite-dimensional vector space over an 
algebraically closed field), then there exist linear transformations B and C such 
that A — B + C, B is diagonable, C is nilpotent, and BC = CB\ the transforma- 
tions B and C are uniquely determined by these conditions. 


§ 58. Jordan form 

It is sound geometric intuition that makes most of us conjecture that, 
for linear transformations, being invertible and being in some sense zero 
are exactly opposite notions. Our disappointment in finding that the range 
and the null-space need not be disjoint is connected with this conjecture. 
The situation can be straightened out by relaxing the sense in which we 
interpret “being zero”; for most practical purposes a linear transformation 
some power of which is zero (that is, a nilpotent transformation) is as zeroish 
as we can expect it to be. Although we cannot say that a linear transforma- 
tion is either invertible or “zero” even in the extended sense of zeroness, we 
can say how any transformation is made up of these two extreme kinds. 
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Theorem 1. Every linear transformation A on a finite-dimensional vector 
space *0 is the direct sum of a nilpotent transformation and an invertible 
transformation . 

proof. We consider the null-space of the k-th power of A ; this is a sub- 
space 91* — 9l(A*). Clearly 9li C 9l 2 C • • •. We assert first that if ever 
91* — 9ft*+i, then 91* = 9l*+y for all positive integers j. Indeed, if A k+j x 
= 0, then A k+l A’~~ l x = 0, whence (by the fact that 91* = 9l* +1 ) it follows 
that A k A j ^x = 0, and therefore that A k+ *~ l z — 0. In other words, 91*^ 
is contained in (and therefore equal to) 9l*+y_i; induction on j establishes 
our assertion. 

Since V is finite-dimensional, the subspaces 91* cannot continue to in- 
crease indefinitely; let q be the smallest positive integer for which 91, = 
91, + 1 . It is clear that 91, is invariant under A (in fact each 91* is such). 
We write <R * - 01 (A k ) for the range of A k (so that, again, it is clear that 
01, is invariant under A) ; we shall prove that D = 91, © 01, and that A 
on 91, is nilpotent, whereas on (R, it is invertible. 

If x is a vector common to 91, and (R„ then A 9 x — 0 and x = A q y for 
some y . It follows that A 2q y — 0, and hence, from the definition of q , that 
x — A q y — 0. We have shown thus that the range and the null-space of 
A q are disjoint; a dimensionality argument (see § 50, Theorem 1) shows 
that they span T), so that V is their direct sum. It follows from the defini- 
tions of q and 91, that A on 91, is nilpotent of index q. If, finally, x is in 
(R, (so that x = A q y for some y) and if Ax = 0, then A q+1 y — 0, whence 
x = A q y = 0; this shows that A is invertible on (R,. The proof of Theo- 
rem 1 is complete. 

The decomposition of A into its nilpotent and invertible parts is unique. 
Suppose, indeed, that V = JC © X so that A on JC is nilpotent and A on 
X is invertible. Since 3C C 9l(A fc ) for some k , it follows that 3C C 9l„ and, 
since 2fC C (R(A*) for all k , it follows that X C (R,; these facts together 
imply that 3C = 91, and X = <R,. 

We can now use our results on nilpotent transformations to study the 
structure of arbitrary transformations. The method of getting a nilpotent 
transformation out of an arbitrary one may seem like a conjuring trick, but 
it is a useful trick, which is often employed. What is essential is the guar- 
anteed existence of proper values; for that reason we continue to assume 
that the scalar field is algebraically closed (see § 55). 

Theorem 2. If A is a linear transformation on a finite-dimensional vector 
space *0, and if \ 1 , • • *, X p are the distinct proper values of A with respective 
algebraic multiplicities m if • • * , m p , then V is the direct sum of p subspaces 
9TCi, • • •, 9fl p of respective dimensions mi, * • •, m Pf such that each 9Tly is 
invariant under A and such that A — Xy is nilpotent on 9TCy. 
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proof. Take any fixed j =» 1, • • • , p, and consider the linear transforma- 
tion A/ = A — X/. To Aj we may apply the decomposition of Theorem 1 
to obtain subspaces 911/ and 9 lj such that Aj is nilpotent on 9TC/ and inverti- 
ble on 91/. Since 3 HI/ is invariant under A/, it is also invariant under 
Aj + Ay = A. Hence, for every X, the determinant of A — X is the product 
of the two corresponding determinants for the two linear transformations 
that A becomes when we consider it on 9 VLj and 91/ separately. Since the 
only proper value of A on 911/ is A/, and since A on 91/ does not have the 
proper value X/ (that is, A — X/ is invertible on 91/), it follows that the di- 
mension of 91 lj is exactly rrij and that each of the subspaces 911/ is disjoint 
from the span of all the others. A dimension argument proves that 9Hj © 
• • • © 3H,, = *0 and thereby concludes the proof of the theorem. 

We proceed to describe the principal results of this section and the pre- 
ceding one in matricial language. If A is a linear transformation on a 
finite-dimensional vector space V , then with respect to a suitable basis of 
*0, the matrix of A has the following form. Every element not on or imme- 
diately below the main diagonal vanishes. On the main diagonal there 
appear the distinct proper values of A, each a number of times equal to 
its algebraic multiplicity. Below any particular proper value there appear 
only l's and 0’s, and these in the following way: there are chains of Ts 
followed by a single 0, with the lengths of the chains decreasing as we read 
from top to bottom. This matrix is the Jordan form or the classical canoni- 
cal form of A; we have B = TAT -1 if and only if the classical canonical 
forms of A and B are the same except for the order of the proper values. 
(Thus, in particular, a linear transformation A is diagonable if and only if 
its classical canonical form is already diagonal, that is, if every chain of 
l’s has length zero.) 

Let us introduce some notation. Let A have p distinct proper values 
Xi, ***> with algebraic multiplicities m Xy m Pf as before; let the 
number of chains of Ts under X/ be 77, and let the lengths of these chains 
be 57, 1 — 1, j/ f 2 — 1, • • *, g/, r , — 1. The polynomial e# defined by «//(X) 
= (X — \j) 9i,i is called an elementary divisor of A of multiplicity qj t i belong- 
ing to the proper value X/. An elementary divisor is called simple if its 
multiplicity is 1 (so that the corresponding chain length is 0) ; we see that a 
linear transformation is diagonable if and only if its elementary divisors 
are simple. 

To illustrate the power of Theorem 2 we make one application. We 
may express the fact that the transformation A — X/ on 9H/ is nilpotent of 
index qj\ by saying that the transformation A on 9TC/ is annulled by the 
polynomial e/i. It follows that A on V is annulled by the product of these 
polynomials (that is, by the product of the elementary divisors of the 
highest multiplicities); this product is called the minimal polynomial of A. 
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It is quite easy to see (since the index of nilpotence of A — A, on 9% is 
exactly gy.i) that this polynomial is uniquely determined (up to a multi- 
plicative factor) as the polynomial of smallest degree that annuls A. Since 
the characteristic polynomial of A is the product of all the elementary 
divisors, and therefore a multiple of the minim al polynomial, we obtain 
the H amilton-C ay ley equation : every linear transformation is annulled by 
its characteristic polynomial. 


exercises 

( l ° b 

1. Find the Jordan form of I 0 0 0 J- 

\0 0 -1/ 


2. What is the maximum number of pairwise non-similar linear transformations 
on a three-dimensional vector space, each of which has the characteristic poly- 
nomial (X — 1)*? 

3. Does every invertible linear transformation have a square root? (To say that 
A is a square root of B means, of course, that A 2 * B) 

4. (a) Prove that if w is a cube root of 1 (w ^ 1), then the matrices 

( 0 1 0\ /I ° ° \ 

0 0 11 and 1 0 0 J 

10 0/ \0 0 oj 2 / 

are similar. 

(b) Discover and prove a generalization of (a) to higher dimensions. 

C O 1 a\ /0 1 0\ 

0 0 1 J and [ 0 0 1 I are similar. 

0 0 0 / \0 0 0 / 


(b) Discover and prove a generalization of (a) to higher dimensions. 

6. (a) Show that the matrices 


( 1 1 1\ /3 0 0\ 

1 1 1 and 0 0 0 

111 / \0 0 0 / 


are similar (over, say, the field of complex numbers). 

(b) Discover and prove a generalization of (a) to higher dimensions. 

7. If two real matrices are similar over Q, then they are similar over 01. 

8. Prove that every matrix is similar to its transpose. 

9. If A and B are n-by-n matrices such that the 2n-by-2n matrices and 

( B 0 \ ^ 

0 B / are s * m *kr, then A an< * & are s hnilar. 



116 


TRANSFORMATIONS 


Sec. 58 


10. Which of the following matrices 
numbers)? 



/O 

0 


(a) ! 

(l 

0 

o). 


Vo 

1 

0/ 



0 

1\ 

(b) 1 


0 

o). 


Vo 

0 

0/ 


/ 

0 

0 1\ 

c) 1 


0 

o o) 


V- 

1 

0 0/ 


are diagonable (over the field of complex 



What about the field of real numbers? 


11. Show that the matrix 

-oioo- 
0 0 10 
0 0 0 1 
.1 0 0 0 . 


is diagonable over the field of complex numbers but not over the field of real num- 
bers. 


12. Let 7 r be a permutation of the integers {1, • • •, n} ; if x * (£i, • * *, fn) is a 

vector in C n , write Ax = (£r<i), &(*>)■ Prove that A is diagonable and 

find a basis with respect to which the matrix of A is diagonal. 

13. Suppose that A is a linear transformation and that 911 is a subspace invariant 
under A . Prove that if A is diagonable, then so also is the restriction of A to Sit. 

14. Under what conditions on the complex numbers on, • • * , a n is the matrix 


"0 ••• 0 an 

0 • • • a2 0 


La n • • • 0 0 J 

diagonable (over the field of complex numbers)? 

15. Are the following assertions true or false? 

(a) A real two-by-two matrix with a negative determinant is similar to a diagonal 
matrix. 

(b) If A is a linear transformation on a complex vector space, and if A k = 1 
for some positive integer k , then A is diagonable. 

(c) If A is a nilpotent linear transformation on a finite-dimensional vector 
space, then A is diagonable. 

16. If A is a linear transformation on a finite-dimensional vector space over an 
algebraically closed field, and if eveiy proper value of A has algebraic multiplicity 
1, then A is diagonable. 
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17. If the minimal polynomial of a linear transformation A on an n-dimensional 
vector space has degree n, then A is diagonable. 

18. Find the minimal polynomials of all projections and all involutions. 

19. What is the minimal polynomial of the matrix 


~Xi 

0 

0 

... 0 “ 

0 

X 2 

0 

... 0 

0 

0 

X 3 

... 0 

.0 

0 

0 

••• Xn_ 


20. (a) What is the minimal polynomial of the differentiation operator on <P„? 
(b) What is the minimal polynomial of the transformation A on (P„ defined bv 

(Ax)(t) - x(t + 1)? 

21. If A is a linear transformation with minimal polynomial p, and if q is a poly- 
nomial such that 3(A) = 0, then q is divisible by p. 

22. (a) If A and B are linear transformations, if p is a polynomial such that p(AB) 
= 0,andifg(0 = /p(0, then^A) = 0. 

(b) What can be inferred from (a) about the relation between the minimal 
polynomials of AB and of BA? 

23. A linear transformation is invertible if and only if the constant term of its 
minimal polynomial is different from zero. 



CHAPTER III 


ORTHOGONALITY 


§ 59* Inner products 

Let us now get our feet back on the ground. We started in Chapter I 
by pointing out that we wish to generalize certain elementary properties 
of certain elementary spaces such as (ft 2 . In our study so far we have done 
this, but we have entirely omitted from consideration one aspect of (ft 2 . 
We have studied the qualitative concept of linearity; what we have entirely 
ignored are the usual quantitative concepts of angle and length. In the 
present chapter we shall fill this gap; we shall superimpose on the vector 
spaces to be studied certain numerical functions, corresponding to the ordi- 
nary notions of angle and length, and we shall study the new structure 
(vector space plus given numerical function) so obtained. For the added 
depth of geometric insight we gain in this way, we must sacrifice some 
generality; throughout the rest of this book we shall have to assume that 
the underlying field of scalars is either the field (ft of real numbers or the 
field 6 of complex numbers. 

For a clue as to how to proceed, we first inspect (ft 2 . If x — (£1, £ 2 ) 
and y — (r?i, rj 2 ) are any two points in (ft 2 , the usual formula for the dis- 
tance between x and y, or the length of the segment joining x and y , is 
V (£x — Vi) 2 + (£2 — i?2) 2 * It is convenient to introduce the notation 

11*11- vV + &* 

for the distance from x to the origin 0 = (0, 0) ; in this notation the dis- 
tance between x and y becomes || x — y ||. 

So much, for the present, for lengths and distances; what about angles? 
It turns out that it is much more convenient to study, in the general case, 
not any of the usual measures of angles but rather their cosines. (Roughly 
speaking, the reason for this is that the angle, in the usual picture in the 
circle of radius one, is the length of a certain circular arc, whereas the co- 
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sine of the angle is the length of a line segment; the latter is much easier 
to relate to our preceding study of linear functions.) Suppose then that 
we let a be the angle between the segment from 0 to x and the positive fj 
axis, and let /3 be the angle between the segment from 0 to y and the same 
axis; the angle between the two vectors x and y is a — 0 , so that its cosine is 


cos (a — 0 ) 


cos a cos 0 + sin a sin 0 


li»?i + Z2V2 

IMHIvlf 


Consider the expression ^771 + £ 2 t 7 2 ; by means of it we can express both 
angle and length by very simple formulas. We have already seen that if 
we know the distance between 0 and x for all x f then we can compute the 
distance between any x and y; we assert now that if for every pair of vec- 
tors x and y we are given the value of £1771 + £ 2 t? 2, then in terms of this value 
we may compute all distances and all angles. Indeed, if we take x = y, 
then fax + £2^2 becomes £i 2 + £ 2 2 = || x || 2 , and this takes care of lengths; 
the cosine formula above gives us the angle in terms of £1771 + £2^2 and the 
two lengths |j x || and )) y ||. To have a concise notation, let us write, for 
z - (£1, £2) and y = (77!, 772), 


£ ivi + £2^2 “ fa y) s 

what we said above is summarized by the relations 


distance from 0 to x = || x || =* V fa x), 


distance from x to y = || x — y ||, 
cosine of angle between x and y = 


fay) 

II * INI y if 


The important properties of (x, 2/), considered as a numerical function of 
the pair of vectors x and y, are the following: it is symmetric in x and y, it 
depends linearly on each of its two variables, and (unless x = 0) the value 
of (x, x) is always strictly positive. (The notational conflict between the 
use of parentheses in (x, y) and in (£1, £ 2 ) is only apparent. It could arise 
in two-dimensional spaces only, and even there confusion is easily avoided.) 

Observe for a moment the much more trivial picture in (R 1 . For x = 
(£1) and y = (771) we should have, in this case, (x, y) — £177! (and it is for 
this reason that (x, y) is known as the inner product or scalar product of 
x and y). The angle between any two vectors is either 0 or tt, so that its 
cosine is either +1 or —1. This shows up the much greater sensitivity of 
the function given by (x, y) } which takes on all possible numerical values. 
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§ 60. Complex inner products 

What happens if we want to consider C 2 instead of (R 2 ? The generaliza- 
tion seems to lie right at hand; for x = (£i, £2) and y = On, 7)2) (where now 
the £'s and ya may be complex numbers), we write (x, y) = £i??i + £2*72, 
and we hope that the expressions \\x\\ = (x, x) and || x - y || can be used 
as sensible measures of distance. Observe, however, the following strange 
phenomenon (where i = V — 1) : 

|| xx || 2 = (ix, ix) = i(x, ix) = i 3 (x, x) = -\\x \\ 2 . 

This means that if j| x || is positive, that is, if x is at a positive distance 
from the origin, then ix is not; in fact the distance from 0 to ix is imaginary. 
This is very unpleasant; surely it is reasonable to demand that whatever 
it is that is going to play the role of (x, y) in this case, it should have the 
property that for x — y it never becomes negative. A formal remedy lies 
close at hand; we could try to write 

( x, y) = £m + £2*72 

(where the bar denotes complex conjugation). In this definition the ex- 
pression (x, y) loses much of its former beauty; it is no longer quite sym- 
metric in x and y and it is no longer quite linear in each of its variables. 
But, and this is what prompted us to give our new definition, 

(x, x) = £i£i + £2^2 = I f 1 1 2 + | £2 1 2 

is surely never negative. It is a priori dubious whether a useful and elegant 
theory can be built up on the basis of a function that fails to possess so 
many of the properties that recommended it to our attention in the first 
place; the apparent inelegance will be justified in what follows by its suc- 
cess. A cheerful portent is this. Consider the space G 1 (that is, the set of 
all complex numbers). It is impossible to draw a picture of any configura- 
tion in this space and then to be able to tell it apart from a configuration in 
(R 2 , but conceptually it is clearly a different space. The analogue of (x, y) 
in this space, for x = (£1) and y = (171), is given by (x, y) = £i»?i, and this 
expression does have a simple geometric interpretation. If we join x and 
y to the origin by straight line segments, (x, y) will not, to be sure, be the 
cosine of the angle between the two segments; it turns out that, for || x || 
= || y || = 1, its real part is exactly that cosine. 

The complex conjugates that we were forced to introduce here will come 
back to plague us later; for the present we leave this heuristic introduction 
and turn to the formal work, after just one more comment on the notation. 
The similarity of the symbols (,) and [,], the one used here for inner product 
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and the other used earlier for linear functionals, is not accidental. We 
shall show later that it is, in fact, only the presence of the complex conju- 
gation in (,) that makes it necessary to use for it a symbol different from 
[,]. For the present, however, we cannot afford the luxury of confusing 
the two. 


§ 61 . Inner product spaces 

Definition. An inner product in a (real or complex) vector space is a 
(respectively, real or complex) numerically valued function of the ordered 
pair of vectors x and y } such that 

(1) (*, y) = (2/, x ) , 

(2) ( <*iXi + ct 2 x 2) y) — ai(xj, y) + a 2 (x 2 , y ), 

(3) (x, x) § 0; ( x , x) = 0 if and only if x — 0. 

An inner product space is a vector space with an inner product. 

We observe that in the case of a real vector space, the conjugation in (1) 
may be ignored. In any case, however, real or complex, (1) implies that 
(x, x) is always real, so that the inequality in (3) makes sense. In an inner 
product space we shall use the notation 

Vm = 11*11; 

the number || x || is called the norm or length of the vector x. A real inner 
product space is sometimes called a Euclidean space; its complex analogue 
is called a unitary space. 

As examples of unitary spaces we may consider (3 n and (P; in the first 
case we write, for x = (£ b • • •, f n ) and y = (in, • • •, i j»), 

(•£> 2 /) = £dii) 

and, in (P, we write 

(x, y) — f x(t)y(t) dt. 

The modifications that convert these examples into Euclidean spaces (that 
is, real inner product spaces) are obvious. 

In a unitaiy space we have 

( 2 ') (z, «i2/i + “ 22 / 2 ) = &i(x, Vi) + S 2 (x, y 2 ). 

(To transform the left side of (2') into the right side, use (1), expand by 
(2), and use (1) again.) This fact, together with the definition of an inner 
product, explains the terminology sometimes used to describe properties 
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(1), (2), (3) (and their consequence (2'). According to that terminology 
(x, y) is a Hermitian symmetric (1), conjugate bilinear ((2) and (2')), and 
positive definite (3) form. In a Euclidean space the conjugation in (2') may 
be ignored along with the conjugation in (1) ; in that case (x, y) is called a 
symmetric, bilinear, and positive definite form. We observe that in either 
case, the conditions on (x, y) imply for || x || the homogeneity property 

l|«x|M«Hl*||. 

(Proof: II ax || 2 = {ax, ax) = aa{x, x).) 


§ 62. Orthogonality 


The most important relation among the vectors of an inner product 
space is orthogonality. By definition, the vectors x and y are called or- 
thogonal if (x, y) = 0. We observe that this relation is symmetric; since 
(x, y) = ( y , x), it follows that (x, y) and (y, x) vanish together. If we 
recall the motivation for the introduction of (x, y), the terminology ex- 
plains itself; two vectors are orthogonal (or perpendicular) if the angle 
between them is 90°, that is, if the cosine of the angle between them is 0. 
Two subspaces are called orthogonal if every vector in each is orthogonal 
to every vector in the other. 

A set 9C of vectors is orthonormal if whenever both x and y are in 9C it 
follows that (x, y) - 0 or (x, y) — 1 according as x ^ y or x = y. (If 9C 
is finite, say 9C = (xj, • • •, x„), we have (x f , xj) = 5,y.) We call an ortho- 
normal set complete if it is not contained in any larger orthonormal set. 

To malfft our last definition in this connection, we observe first that an 
orthonormal set is linearly independent. Indeed, if \x t , • • •, x*} is any 
finite subset of an orthonormal set SC, then a > x < = 0 implies that 

0 = ( ^ l i a,X{, xj) — 'j'j i aj(xi, Xj) — aj , 


in other words, a linear combination of the x’s can vanish only if all the 
coefficients vanish. From this we conclude that in a finite-dimensional 
innwr product space the number of vectors in an orthonormal set is always 
finite, and, in fact, not greater than the linear dimension of the space. 
We define, in this case, the orthogonal dimension of the space, as the largest 
number of vectors an orthonormal set can contain. 

Warning: for all we know at this stage, the concepts of orthogonality 
and orthonormal sets are vacuous. Trivial examples can be used to show 
that things are not so bad as all that; the vector 0, for instance, is always 
orthogonal to every vector, and, if the space contains a non-zero vector x, 


then the set consisting 



alone is an orthonormal set. 


We grant that 
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these examples are not very inspiring. For the present, however, we re- 
main content with them; soon we shall see that there are always “enough” 
orthogonal vectors to operate with in comfort. 

Observe also that we have no right to assume that the number of ele- 
ments in a complete orthonormal set is equal to the orthogonal dimension. 
The point is this: if we had an orthonormal set with that many elements, 
it would clearly be complete; it is conceivable, just the same, that some 
other set contains fewer elements, but is still complete because its nasty 
structure precludes the possibility of extending it. These difficulties are 
purely verbal and will evaporate the moment we start proving things; they 
occur only because from among the several possibilities for the definition 
of completeness we had to choose a definite one, and we must prove its 
equivalence with the others. 

We need some notation. If 8 is any set of vectors in an inner product 
space V y we denote by S x the set of all vectors in V that are orthogonal to 
every vector in 8. It is clear that S x is a subspace of *0 (whether or not 8 
is one), and that 8 is contained in 8 XX = (8 X ) X . It follows that the sub- 
space spanned by 8 is contained in S xx . In case 8 is a subspace, we shall 
call 8 X the orthogonal complement of 8. We use the sign in order to be re- 
minded of orthogonality (or perpendicularity). In informal discussions, 
8 X might be pronounced as “E perp.” 


EXERCISES 

1. Given four complex numbers a , ft y, and 6, tiy to define an inner product in 
& by writing 

(s, y ) = a£iTji + /3£j fa + + b&m 

whenever x = (£i, £2) and y = (171, 172). Under what conditions on a, ft y, and 5 
does this equation define an inner product? 

2. Prove that if x and y are vectors in a unitary space, then 

4 (x, y) =* || x + y || 2 - || x - y || 2 + i || z + iy || 2 - i || x - iy || 2 . 

3 . If inner product in (P n+ i is defined by ( x , y) = [ x(t)y(t) dt , and if x } {t) = ft 

•'o 

= 0, • • •, n — 1, find a polynomial of degree n orthogonal to rr 0 , xt, • • x»_i. 

4 . (a) Two vectors x and y in a real inner product space are orthogonal if and 
only if || x + y || 2 = || x || 2 + \\y\\ 2 . 

(b) Show that (a) becomes false if “real” is changed to “complex.” 

(c) Two vectors x and y in a complex inner product space are orthogonal if and 
only if || ax + 0 y || 2 = || ax || 2 + || ft/ 1| 2 for all pairs of scalars a and 0 . 

(d) If x and y are vectors in a real inner product space, and if II x || = || y ||, 
then x — y and x + y are orthogonal. (Picture?) Discuss the corresponding 
statement for complex spaces. 
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(e) If x and y are vectors in an inner product space, then 
II s + V II 2 + || £ ” 2/ II 2 — 2|| x || 2 + 2|| y || 2 . 

Picture? 


§ 63. Completeness 

Theorem 1. 7/9C = {x u is any finite orthonormal set in an inner 

product space, if x is any vector , and if a* = (x, X{) f then (BesseVs inequality) 

The vector x r = x — 2, a^i is orthogonal to each xj and , consequently , to 
the subspace spanned by 9C. 

proof. For the first assertion: 

0 2S || a;' II 2 = (x 1 , x') = (* ~ Hi «i*i, * - Hi «i*i) 

= (x, a;) — H a;) ^ ? a ji x ) *i) "b <*»**/(*»'» a»y) 

- II x II 2 - H< l«.l 2 - Hi l«<l 2 + Hi l«.-l 2 
= IM 2 - Hikil 2 ; 

for the second assertion: 

(*', Xj) = (x, Xj) - Hi «i(*ii x i) = «i ~ «i = 

Theorem 2. 7/ 9C is any finite orthonormal set in an inner product space 
*0, the following six conditions on 9C are equivalent to each other . 

(1) The orthonormal set 9C is complete . 

(2) If (x, Xi) = 0 for i = 1, • • *, n, then x = 0. 

(3) The subspace spanned by 9C is the whole space V. 

(4) If x is in *0, then x = ( x > x i) x *- 

(5) If x and y are in *0, then (. ParsevaVs identity) 

(x, y) = Hi (*. x t)( x i, y). 


(6) If x is in U, then 

II* II 2 - Hi IMP- 

PROOF. We shall establish the implications (1) => (2) => (3) => (4) => 
(5) => (6) => (1). Thus we first assume (1) and prove (2), then assume 
(2) to prove (3), and so on till we finally prove (1) assuming (6). 

(1) => (2). If (x, Xi) = 0 for all t and x ^ 0, then we may adjoin 
*/]) x || to 9C and thus obtain an orthonormal set larger than 9C. 
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(2) => (3). If there is an x that is not a linear combination of the a;,-, 

then, by the second part of Theorem 1, x' = x — (x, xj)x< is different 

from 0 and is orthogonal to each x,-. 

(3) =* (4). If every x has the form x = ayxy, then 

(x, Xi) = Si a i( x h x i) = «»• 

(4) => (5). If x = S»«* x » an d V ~ Si ftxy, w ith «t = (z, *») and 0y 
= (y, *i), then 

(•£> 2/) = (S' a i x i> Si fijXj) ~ ^ &i$j(Xi, Xy) = ^ &iPi‘ 

(5) => (6). Set x = y. 

(6) => (1). If 9C were contained in a larger orthogonal set, say if x 0 is 
orthogonal to each xy, then 

II x o II 2 = S. I (* 0 , *.•) I 2 = o, 

so that xq = 0. 


§ 64. Schwarz’s inequality 

Theorem. If x and y are vectors in an inner product space, then (Schwarz 7 s 
inequality) 

I ( x j y) I ^ II ^ 11 * II y II- 

proof. If y — 0, both sides vanish. If y 0, then the set consisting 
of the vector y/\\ y || is orthonormal, and, consequently, by Bessel’s in- 
equality 

I (x, y/|| y II) I 2 ^ II * II 2 . 

The Schwarz inequality has important arithmetic, geometric, and ana- 
lytic consequences. 

(1) In any inner product space we define the distance 8(x, y) between 
two vectors x and y by 

S(x, y) = || x — y || = V(x — y, x — y). 

In order for 5 to deserve to be called a distance, it should have the follow- 
ing three properties: 

(i) 8(x, y) = 8(y , x), 

(ii) 6(x, y) ^ 0; 8(x, y) = 0 if and only if x = y, 

(iii) 8(x, y) ^ 8(x, z) + 8(z , y). 

(In a vector space it is also pleasant to be sure that distance is invariant 
under translations: 

(iv) 8(x, y) = 5(x + z, y + z).) 

Properties (i), (ii), and (iv) are obviously possessed by the particular 5 we 
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defined; the only question is the validity of the “triangle inequality” (iii). 
To prove (iii), we observe that 

II x + y || 2 = (x + y, x + y) = || x || 2 + (x, y ) + ( y , x) + || y || 2 
= || x || 2 + (x, y) + (x, y) + || y || 2 
= || x || 2 + 2 Re (x, y) + || y || 2 
^ II * II 2 + 2| (x, y)| + || y || 2 
= II 1 II 2 + 2|| x || • || y || + || y || 2 

-(11*11 + II v ID 2 ; 

replacing x by x — z and y by z — y, we obtain 

II * - y || ^ II x - 2 || + || 2 - y ||, 

and this is equivalent to (iii). (We use Re £ to denote the real part of the 
complex number f ; if f = { + *«/, with real £ and j;, then Re f = £. The 
imaginary part of f, that is, the real number ij, is denoted by Im f .) 

(2) In the Euclidean space (R n , the expression 

(x, y) 

II * Ml y II 

gives the cosine of the angle between x and y. The Schwarz inequality in 
this case merely amounts to the statement that the cosine of a real angle 
is 1. 

(3) In the unitary space e", the Schwarz inequality becomes the so- 
called Cauchy inequality; it asserts that for any two sequences (|i, • • •, £„) 
and (i;i, •••,»?«) of complex numbers, we have 

lE"-ifriil 2 ^ £"-ilfcl 2 -£“-ik-l 2 - 

(4) In the space CP, the Schwarz inequality becomes 

I £x{t)W) dt\ 2 £ £ |x«) I 2 dt-£ \y(t) 1 2 dt. 

It is useful to observe that the relations mentioned in (l)-(4) above are 
not only analogous to the general Schwarz inequality, but actually conse- 
quences or special cases of it. 

(5) We mention in passing that there is room between the two notions 
(general vector spaces and inner product spaces) for an intermediate con- 
cept of some interest. This concept is that of a normed vector space, a 
vector space in which there is an acceptable definition of length, but noth- 
ing is said about angles. A norm in a (real or complex) vector space is a 
numerically valued function || x || of the vectors x such that || x || ^ 0 un- 
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less x - 0, || ax || - | « | • || * ll» and || * + y || g || x || + || y ||. Our dis- 
cussion so far shows that an inner product space is a normed vector space; 
the converse is not in general true. In other words, if all we are given is a 
norm satisfying the three conditions just given, it may not be possible to 
find an inner product for which (x f x) is identically equal to || x || 2 . In 
somewhat vague but perhaps suggestive terms, we may say that the norm 
in an inner product space has an essentially “quadratic” character that 
norms in general need not possess. 


§ 65. Complete orthonormal sets 

Theorem. If 1) is an n-dimensional inner product space , then there exist 
complete orthonormal sets in V, and every complete orthonormal set in V 
contains exactly n elements . The orthogonal dimension of V is the same as 
its linear dimension . 

proof. To people not fussy about hunting for an element in a possibly 
uncountable set, the existence of complete orthonormal sets is obvious. 
Indeed, we have already seen that orthonormal sets exist, so we choose 
one; if it is not complete, we may enlarge it, and if the resulting orthonor- 
mal set is still not complete, we enlarge it again, and we proceed in this 
way by induction. Since an orthonormal set may contain at most n ele- 
ments, in at most n steps we shall reach a complete orthonormal set. This 
set spans the w r hole space (see § 63, Theorem 2, (1) => (3)), and, since it is 
also linearly independent, it is a basis and therefore contains precisely n 
elements. This proves the first assertion of the theorem; the second asser- 
tion is now obvious from the definitions. 

There is a constructive method of avoiding this crude induction, and 
since it sheds further light on the notions involved, we reproduce it here 
as an alternative proof of the theorem. 

Let 9C — { xiy • • *, x n ) be any basis in 13. We shall construct a complete 
orthonormal set = {yi, • • *, y n ) with the property that each yj is a 
linear combination of x\, • • *, Xj . To begin the construction, we observe 
that Xi 0 (since 9C is linearly independent) and we write y\ = x\/\\ X\ ||. 
Suppose now that y\, • • • , y r have been found so that they form an ortho- 
normal set and so that each y, (J = 1, • • •, r) is a linear combination of 
Xi y • • •, Xj. We write 

z = x r +i — (e*i2/i H f- a r y r ), 

where the values of the scalars ai, • • •, a r are still to be determined. Since 

(z, Vi) = (av+i - £»• Vi) = (zr+i, y,) - <*s 

for j = 1, • • •, r, it follows that if we choose a,- = (z r +i, yj), then (z, yj) = 0 
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for j = 1, • • •, r. Since, moreover, z is a linear combination of x r+ i and 
t/i , • • • , y ri it is also a linear combination of x r+ i and x X) • • •, x r . Finally 
z is different from zero, since zi, • • *, x r , x r +i are linearly independent and 
the coefficient of x r+ i in the expression for z is not zero. We write y r +i 
— z/\\ z ||; clearly {y u • • •, y n t/ r +i} is again an orthonormal set with all 
the desired properties, and the induction step is accomplished. We shall 
make use of the fact that not only is each yj a linear combination of the x's 
with indices between 1 and j> but, vice versa, each x j is a linear combina- 
tion of the y’ s with indices between 1 and j. The method of converting a 
linear basis into a complete orthonormal set that we just described is known 
as the Gram-Schmidt orthogonalization process. 

We shall find it convenient and natural, in inner product spaces, to 
work exclusively with such bases as are also complete orthonormal sets. 
We shall call such a basis an orthonormal basis or an orthonormal coordinate 
system; in the future, whenever we discuss bases that are not necessarily 
orthonormal, we shall emphasize this fact by calling them linear bases. 


EXERCISES 

1. Convert (P 2 into an inner product space by writing (x, y) — f x(t)y{t) dt when- 
ever x and y are in (P 2 , and find a complete orthonormal set in that space. 

2. If x and y are orthogonal unit vectors (that is, {x, y\ is an orthonormal set), 
what is the distance between x and y ? 

3. Prove that if | (x, y) | = || x || • || V II (that is, if the Schwarz inequality reduces 
to an equality), then x and y are linearly dependent. 

4. (a) Prove that the Schwarz inequality remains true if, in the definition of an 
inner product, “strictly positive” is replaced by “non-negative.” 

(b) Prove that for a “non-negative” inner product of the type mentioned in 
(a), the set of all those vectors x for which (x, x) = 0 is a subspace. 

(c) Form the quotient space modulo the subspace mentioned in (b) and show 
that the given “inner product” induces on that quotient space, in a natural manner, 
an honest (strictly positive) inner product. 

(d) Do the considerations in (a), (b), and (c) extend to normed spaces (with 
possibly no inner product)? 

5. (a) Given a strictly positive number a, try to define a norm in (R 2 by writing 

11 x 11 «a&r+ 1*2 n i/a 

whenever x = (£ 1 , £ 2 ). Under what conditions on a. does this equation define a 
norm? 

(b) Prove that the equation 

|| x || = max { ]£i|, |£ s | } 


defines a norm in (R 2 . 
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(c) To which ones among the norms defined in (a) and (b) does there correspond 
an inner product in (R 2 such that || x || 2 = ( x , x) for all a: in (R 2 ? 

6. (a) Prove that a necessary and sufficient condition on a real normed space that 
there exist an inner product satisfying the equation || x || 2 * (x, x) for all x is that 

\\x + y\\*+\\x~yl\* = 2\{xr + 2l\y\\* 

for all x and y . 

(b) Discuss the corresponding assertion for complex spaces. 

(c) Prove that a necessary and sufficient condition on a norm in (R 2 that there 
exist an inner product satisfying the equation || x || 2 = (x, x) for all x in (R 2 is that 
the locus of the equation || x || = 1 be an ellipse. 

7. If {xi, • • •, x n ) is a complete orthonormal set in an inner product space* and 
if Vi — 2*- i Xi y j= 1, • • ♦, n, express in terms of the x’s the vectors obtained by 
applying the Gram-Schmidt orthogonalization process to the y’ s. 

§ 66. Projection theorem 

Since a subspace of an inner product space may itself be considered as 
an inner product space, the theorem of the preceding section may be ap- 
plied. The following result, called the projection theorem , is the most im- 
portant application. 

Theorem. If 371 is any subspace of a finite-dimensional inner product 

space *0, then *0 is the direct sum of and 9E X , and 9Tl xx = SflT. 

proof. Let 9C = {xi, • * x m ] be an orthonormal set that is complete 
in Sflt, and let z be any vector in T). We write x — where a* — 

(z, Xi); it follows from § 63, Theorem 1, that y = z — x is in 3Tl x , so that 
z is the sum of two vectors, z — x + y, with x in 9TC and y in SE- 1 . That 
9TI and 2fTC x are disjoint is clear; if x belonged to both, then we should have 
[[ x |1 2 = (x, x) = 0. It follows from the theorem of § 18 that V = TO 
® 9E X 

We observe that in the decomposition z — x + y, we have 
(z, x) = (x + y, x) = || X II 2 + (y,x) = \\x II 2 , 
and, similarly, 

I 2 . 

Hence, if z is in 3fE xx , so that (z, y) — 0, then || y || 2 = 0, so that z (=x) 
is in 9fTC; in other words, 3Tt xx is contained in SfTC. Since we already know 
that 9E is contained in 3Tl xx , the proof of the theorem is complete. 

This kind of direct sum decomposition of an inner product space (via a 
subspace and its orthogonal complement) is of considerable geometric in- 
terest. We shall study the associated projections a little later; they turn 
out to be an interesting and important subclass of the class of all projec- 
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tions. At present we remark only on the connection with the Pythagorean 
theorem; since (z, x) = || x || 2 and (z, y) — || y || 2 , we have 

II 2 II 2 = («, z) = (z, x) + ( 2 , y) m || x || 2 + || y || 2 . 

In other words, the square of the hypotenuse is the sum of the squares of 
the sides. More generally, if 9TCj, • • 911* are pairwise orthogonal sub- 

spaces in an inner product space D, and if x = x x + • • • + x*, with in 
91 lj for j = 1, • • • , k , then 

ii* 11 2 = 11*1 n 2 +---+n* i |i 2 . 

§ 67. Linear functionals 

We are now in a position to study linear functionals on inner product 
spaces. For a general n-dimensional vector space the dual space is also 
n-dimensional and is therefore isomorphic to the original space. There is, 
however, no obvious natural isomorphism that we can set up; we have to 
wait for the second dual space to get back where we came from. The main 
point of the theorem we shall prove now is that in inner product spaces 
there is a “natural” correspondence between T) and V'; the only cloud on 
the horizon is that in general it is not quite an isomorphism. 

Theorem. To any linear functional y f on a finite-dimensional inner prod- 
uct space V there corresponds a unique vector y in *0 such that y'(x) = (x, y) 
for all x. 

proof. If y' = 0, we may choose y == 0; let us from now on assume that 
y ( {x) is not identically zero. Let 9TI be the subspace consisting of all vectors 
x for which y f (x) = 0, and let 91 = 3TZ X be the orthogonal complement of 
SKI. The subspace 91 contains a non-zero vector y 0 ; multiplying by a suit- 
able constant, we may assume that || y 0 || = 1. We write y — y'(2/o)*2/o* 
(The bar denotes complex conjugation, as usual; in case *U is a real inner 
product space and not a unitary space, the bar may be omitted.) We do 
then have the desired relation 

( 1 ) y'fr) = (*, y) 

at least for x — y 0 and for all x in 9ft. For an arbitrary x in ‘O, we write 
x 0 = x — \y 0 , where v 

y (s) . 
y'(yo) ; 

then y'(x 0 ) = 0 and x = x 0 + \yo is a linear combination of two vectors 
for each of which (1) is valid. From the linearity of both sides of (1) it 
follows that (1) holds for x, as was to be proved. 
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To prove uniqueness, suppose that (x, y t ) = (x, y 2 ) for all x. It follows 
that (x, y x — 2 / 2 ) = 0 for all x, and therefore in particular for x = y x — 2/ 2 , 
so that || yi - y 2 \\ 2 = 0 and y! = y 2 . 

The correspondence y f y is a one-to-one correspondence between *0 
and V ', with the property that to y\ + y' 2 there corresponds y x + y 2 , and 
to cty f there corresponds ay; for this reason we refer to it as a conjugate 
isomorphism . In spite of the fact that this conjugate isomorphism makes 
V r practically indistinguishable from V , it is wise to keep the two con- 
ceptually separate. One reason for this is that we should like V' to be an 
inner product space along with *0; if, however, we follow the clue given by 
the conjugate isomorphism between T) and V' f the conjugation again causes 
trouble. Let y\ and y' 2 be any two elements of U'; if y'i(x) = (x, y x ) and 
y' 2 (x) = (x, y 2 ), the temptation is great to write 

(y'i, y f 2) = (j/u 2/2). 

A moment’s reflection will show that this expression may not satisfy § 61, 
(2), and is therefore not a suitable inner product. The trouble arises in 
complex (that is, unitary) spaces only; we have, for example, 

(«v'i, y'2) = (ayi, 2/2) = 5(t/i, 2/2) = 5 (y'i, y' 2 ). 

The remedy is clear; we write 

(2) (y'i, y'2) = (2/1, 2/2) = (2/2, 2/1); 

we leave it to the reader to verify that with this definition *0' becomes an 
inner product space in all cases. We shall denote this inner product space 
by V*. 

We remark that our troubles (if they can be called that) with complex 
conjugation have so far been more notational than conceptual; it is still 
true that the only difference between the theory of Euclidean spaces and 
the theory of unitary spaces is that an occasional bar appears in the latter. 
More profound differences between the two theories will arise when we go 
to study linear transformations. 

§ 68. Parentheses versus brackets 

It becomes necessary now to straighten out the relation between general 
vector spaces and inner product spaces. The theorem of the preceding 
section shows that, as long as we are careful about complex conjugation, 
(x, y) can completely take the place of [x, y]. It might seem that it would 
have been desirable to develop the entire subject of general vector spaces 
in such a way that the concept of orthogonality in a unitary space becomes 
not merely an analogue but a special case of some previously studied general 
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relation between vectors and functionals. One way, for example, of avoid- 
ing the unpleasantness of conjugation (or, rather, of shifting it to a less 
conspicuous position) would have been to define the dual space of a com- 
plex vector space as the set of conjugate linear functionals, that is, the set 
of numerically valued functions y for which 

+ a 2 x 2 ) = 5i2/(xx) + a 2 y(x 2 ). 

Because it seemed pointless (and contrary to common usage) to introduce 
this complication into the general theory, we chose instead the roundabout 
way that we just traveled. Since from now on we shall deal with inner 
product spaces only, we ask the reader mentally to revise all the preceding 
work by replacing, throughout, the bracket [x, y] by the parenthesis (x, y). 
Let us examine the effect of this change on the theorems and definitions of 
the first two chapters. 

The replacement of *0' by V * is merely a change of notation; the new 
symbol is supposed to remind us that something new (namely, an inner 
product) has been added to V f . Of a little more interest is the (conjugate) 
isomorphism between *0 and V*; by means of it the theorems of § 15, 
asserting the existence of linear functionals with various properties, may 
now be interpreted as asserting the existence of certain vectors in *0 itself. 
Thus, for example, the existence of a dual basis to any given basis 9C = 
{xi, • ■ •, x n ) implies now the existence of a basis = {t/i, ••*,?/„} (of V) 
with the property that (x*, yj) — 6$. 

More exciting still is the implied replacement of the annihilator 9fTl° of a 
subspace 9TC (3rrt° lying in T)' or *0*) by the orthogonal complement 9U X 
(lying, along with 901, in *0). The most radical new development, however, 
concerns the adjoint of a linear transformation. Thus we may write the 
analogue of § 44, (1), and corresponding to every linear transformation A 
on *U we may define a linear transformation A* by writing 

(Ax, y) = (x, A*y) 

for every x. It follows from this definition that A * is again a linear trans- 
formation defined on the same vector space *U, but, because of the Hermi- 
tian symmetry of (x, y ), the relation between A and A* is not quite the 
same as the relation between A and A'. The most notable difference is 
that (in a unitary space) (a A)* = a A* (and not (oA)* = aA*). Associ- 
ated with this phenomenon is the fact that if the matrix of A, with respect 
to some fixed basis, is (a,v), then the matrix of A*, with respect to the dual 
basis, is not (ag) but (ag ). For determinants we do not have det A* = 
det A but det A* = det A, and, consequently, the proper values of A* are 
not the same as those of A, but rather their conjugates. Here, however, 
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the differences stop. All the other results of § 44 on the anti-isomorphic 
nature of the correspondence A A* are valid; the identity A = A** 
is strictly true and does not need the help of an isomorphism to interpret it. 

Presently we shall discuss linear transformations on inner product 
spaces and we shall see that the principal new feature that differentiates 
their study from the discussion of Chapter II is the possibility of compar- 
ing A and A* as linear transformations on the same space, and of investi- 
gating those classes of linear transformations that bear a particularly simple 
relation to their adjoin ts. 

§ 69. Natural isomorphisms 

There is now only one more possible doubt that the reader might (or, at 
any rate, should) have. Many of our preceding results were consequences 
of such reflexivity relations as A** = A; do these remain valid after the 
brackets-to-parentheses revolution? More to the point is the following 
way of asking the question. Everything we say about a unitary space D 
must also be true about the unitary space 13*; in particular it is also in a 
natural conjugate isomorphic relation with its dual space 1)**. If now to 
every vector in *0 we make correspond a vector in 13**, by first applying 
the natural conjugate isomorphism from *0 to D* and then going the same 
way from 13* to 13**, then this mapping is a rival for the title of natural 
mapping from 13 to 13**, a title already awarded in Chapter I to a seemingly 
different correspondence. What is the relation between the two natural 
correspondences? Our statements about the coincidence, except for trivial 
modifications, of the parenthesis and bracket theories, are really justified 
by the fact, which we shall n ow prove, that the two mappings are the same. 
(It should not be surprising > since a = a, that after two applications the 
bothersome conjugation disappears.) The proof is shorter than the intro- 
duction to it. 

Let yo be any element of 13 ; to it there corresponds the linear functional 
?/o* in 13*, defined by yo*(x) = (x, y 0 ), and to yo*, in turn, there corresponds 
the linear functional y 0 ** in 13**, defined by yo**(y*) = (y*, yo*)- Both 
these correspondences are given by the mapping introduced in this chapter. 
Earlier (see § 16) the correspondent y 0 ** in 13** of yo in 13 was defined by 
Vo **(y*) = y*(yo) for all y* in 13*; we must show that yo**, as we here 
defined it, satisfies this identity. Let y* be any linear functional on 13 
(that is, any element of 13*); we have 

yo**(y*) = (y*, yo*) = (y 0 , v) = y*(yo)- 

(The middle equality comes from the definition of inner product in D*.) 
This settles all our problems. 
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EXERCISES 

1. If 911 and 9ft are subspaces of a finite-dimensional inner product space, then 

(311 + 31)-*- = 3TC X 0 3l x 
and 

(arc fl ai)- 1 - = an;- 1 - + a i L . 

2 . If y'(x) - |(£i + | 2 + Is) for each x = (|i, |s, I 3 ) in 6 *, find a vector y in 
e 3 such that y f (x) = (x, y). 

3. If y is a vector in an inner product space, if A is a linear transformation on 
that space, and if f(x) = (y, Ax) for every vector x, then / is a linear functional; 
find a vector y* such that f(x) = (x, y*) for every x. 

4. (a) If A is a linear transformation on a finite-dimensional inner product space, 
then tr (A* A) ^ 0; a necessary and sufficient condition that tr {A* A) = 0 is that 
A = 0. (Hint : look at matrices.) This property of traces can often be used to 
obtain otherwise elusive algebraic facts about products of transformations and their 
ad joints. 

(b) Prove by a trace argument, and also directly, that if A h • • • , A* are linear 
transformations on a finite-dimensional inner product space and if Xj- i A,*A, = 0, 
then A i — * • • = A* = 0. 

(c) If A*A = B*B - BB * then A = 0. 

(d) If A* commutes with A and if A commutes with B , then A* commutes with 
B. (Hint: if C * A*B - BA* and D = AB — BA, then tr (C*C) « tr (D*D) 
+ tr [(A* A - AA*)(B*B - BB*)].) 

5. (a) Suppose that X is a unitary space, and form the set of all ordered pairs 
(x, y) with x and y in X (that is, the direct sum of X with itself). Prove that the 
equation 

«*1, Vi), (Xi, yi» “ (xi, Xt) + (yi, yj) 

defines an inner product in the direct sum X ©X. 

(b) If U is defined by U(x, y) = (y, —x), then U*U = 1. 

(c) The graph of a linear transformation A on X is the set of all those elements 
(x, y) of X © X for which y — Ax. Prove that the graph of every linear transforma- 
tion on X is a subspace of X © X. 

(d) If A is a linear transformation on X with graph g, then the graph of A* 
is the orthogonal complement (in X © X) of the image under U (see (b)) of the 
graph of A. 

6 . (a) If for every linear transformation A on a finite-dimensional inner product 
space N(A) = Vtr (A* A), then N is a norm (on the space of all linear transforma- 
tions). 

(b) Is the norm N induced by an inner product? 

7. (a) Two linear transformations A and B on an inner product space are called 
congruent if there exists an invertible linear transformation P such that B = P*AP . 
(The concept is frequently defined for the “quadratic forms ,, associated with linear 
transformations and not for the linear transformations themselves; this is largely 
a matter of taste. Note that if a(x) = (Ax, x) and 0(x) = (Bx, x), then B ~ P*AP 
implies that P(x) = a(Px).) Prove that congruence is an equivalence relation. 
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(b) If A and B are congruent, then so also are A* and B*. 

(c) Does there exist a linear transformation A such that A is congruent to a 
scalar a, but A 5^ a? 

(d) Do there exist linear transformations A and B such that A and B are con- 
gruent, but A 2 and B 2 are not? 

(e) If two invertible transformations are congruent, then so are their inverses. 


§ 70. Self-adjoint transformations 

Let us now study the algebraic structure of the class of all linear trans- 
formations on an inner product space *U. In many fundamental respects 
this class resembles the class of all complex numbers. In both systems, 
notions of addition, multiplication, 0, and 1 are defined and have similar 
properties, and in both systems there is an involutory anti-automorphism 
of the system onto itself (namely, A — > A* and f — > f). We shall use 
this analogy as a heuristic principle, and we shall attempt to carry over 
to linear transformations some well-known concepts from the complex 
domain. We shall be hindered in this work by two difficulties in the theory 
of linear transformations, of which, possibly surprisingly, the second is 
much more serious; they are the impossibility of unrestricted division and 
the non-commutativity of general linear transformations. 

The three most important subsets of the complex number plane are the 
set of real numbers, the set of positive real numbers, and the set of num- 
bers of absolute value one. We shall now proceed systematically to use 
our heuristic analogy of transformations with complex numbers, and to try 
to discover the analogues among transformations of these well-known nu- 
merical concepts. 

When is a complex number real? Clearly a necessary and sufficient 
condition for the reality of f is the validity of the equation f = ?. We 
might accordingly (remembering that the analogue of the complex conju- 
gate for linear transformations is the adjoint) define a linear transforma- 
tion A to be real if A = A*. More commonly linear transformations A 
for which A = A * are called self-adjoint; in real inner product spaces the 
usual word is symmetric , and, in complex inner product spaces, Hermitian. 
We shall see that self-adjoint transformations do indeed play the same role 
as real numbers. 

It is quite easy to characterize the matrix of a self-adjoint transforma- 
tion with respect to an orthonormal basis 9C = {xi, • • •, x n }. If the matrix 
of A is (a,y), then we know that the matrix of A* with respect to the dual 
basis of 9C is (a,-,*), where a u * = aji; since an orthonormal basis is self-dual 
and since A = A*, we have 

«»; = ay* 

We leave it to the reader to verify the converse: if we define a linear trans- 
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formation A by means of a matrix (a#) and an arbitrary orthonormal co- 
ordinate system 9C = [x u • • •, x n ), via the usual equations 

A (2i £ j X j ) = Vi x i ) 

Vi = Z; «»/£>> 

and if the matrix (a t y) is such that a# = ayh then A is self-adjoint. 

The algebraic rules for the manipulation of self-adjoint transformations 
are easy to remember if we think of such transformations as the analogues 
of real numbers. Thus, if A and B are self-adjoint, so is A + B; if A is 
self-adjoint and different from 0, and if a is a non-zero scalar, then a neces- 
sary and sufficient condition that aA be self-adjoint is that a be real; and 
if A is invertible, then both or neither of A and A -1 are self-adjoint. The 
place where something always goes wrong is in multiplication; the product 
of two self-adjoint transformations need not be self-adjoint. The positive 
facts about products are given by the following two theorems. 

Theorem 1. If A and B are self-adjoint, then a necessary and sufficient 
condition that AB (or BA) he self-adjoint is that AB — BA (that is that 
A and B commute ). 


proof. If AB = BA, then (AB)* = B*A* = BA = AB. If (AB)* = 
AB, then AB - (AB)* = B*A* = BA. 


Theorem 2. If A is self-adjoint, then B*AB is self-adjoint for all B; if B 
is invertible and B*AB is self-adjoint, then A is self-adjoint. 

PROOF. If A = A* then (B*AB)* = B*A*B** - B*AB. If B is in- 
vertible and B*AB = (B*AB)* = B*A*B, then (multiply by B*~ l on the 
left and B~ x on the right) A = A*. 

A complex number f is purely imaginary if and only if f = — f. The 
corresponding concept for linear transformations is identified by the word 
skew; if a linear transformation A on an inner product space is such that 
A* = —A, then A is called skew symmetric or skew Hermitian according as 
the space is real or complex. Here is some evidence for the thoroughgoing 
nature of our analogy between complex numbers and linear transforma- 
tions: an arbitrary linear transformation A may be expressed, in one and 
only one way, in the form A = B + C, where B is self-adjoint and C is 
skew. (The representation of A in this form is sometimes called the Car- 
tesian decomposition of A.) Indeed, if we write 


( 1 ) 


B 


A + A* 

-i 

2 

A - A* 



( 2 ) 


C 


2 
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then we have B* = 


A* + A 
2 


B and C* = 


A* — A 
2 


— C, and, of 


course, A = B + C. From this proof of the existence of the Cartesian 
decomposition, its uniqueness is also clear; if we do have A — B + C, then 
A* = B — C y and, consequently, A, B y and C are again connected by (1) 
and (2). 

In the complex case there is a simple way of getting skew Hermitian 
transformations from Hermitian ones, and vice versa: just multiply by 
i( = ). It follows that, in the complex case, every linear transforma- 

tion A has a unique representation in the form A — B + iC , where B and 
C are Hermitian. We shall refer to B and C as the real and imaginary 
parts of A. 


EXERCISES 


1. Give an example of two self-adjoint transformations whose product is not 
self-adjoint. 

2. Consider the space (P ft with the inner product given by ( x , y) = I x(t)y(t) dt. 

Jo 

(a) Is the multiplication operator T (defined by (Tx)(t) = tx(t)) self-adjoint? 

(b) Is the differentiation operator D self-adjoint? 


3. 


(a) Prove that the equation ( x , y) 




defines an inner prod- 


uct in the space (P n . 

(b) Is the multiplication operator T (defined by (Tx)(t) = tx(t)) self-adjoint (with 
respect to the inner product defined in (a))? 

(c) Is the differentiation operator D self-adjoint? 


4. If A and B are linear transformations such that A and AB are self-adjoint 
and such that 91(4) C 91(1?), then there exists a self-adjoint transformation C 
such that CA = B. 


5. If A and B are congruent and A is skew, does it follow that B is skew? 

6. If A is skew, does it follow that so is A 2 ? How about A 3 ? 

7. If both A and B are self-adjoint, or else if both are skew, then AB + BA is 
self-adjoint and AB — BA is skew. What happens if one of A and B is self-adjoint 
and the other skew? 


8. If A is a skew-symmetric transformation on a Euclidean space, then (Ax, x) 
— 0 for every vector x . Converse? 

9. If A is self-adjoint, or skew, and if A 2 x ~ 0, then Ax = 0. 

10. (a) If A is a skew-symmetric transformation on a Euclidean space of odd 
dimension, then det A = 0. 

(b) If A is a skew-symmetric transformation on a finite-dimensional Euclidean 
space, then p(A) is even. 
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§ 71. Polarization 

Before continuing with the program of studying the analogies between 
complex numbers and linear transformations, we take time out to pick up 
some important auxiliary results about inner product spaces. 

Theorem 1. A necessary and sufficient condition that a linear transforma- 
tion A on an inner product space he 0 is that {Ax, y) = 0 for all x and y. 

proof. The necessity of the condition is obvious; sufficiency follows 
from setting y equal to Ax. 

Theorem 2. A necessary and sufficient condition that a self-adjoint linear 
transformation A on an inner product space A be 0 is that {Ax, x) — 0 for 
all x . 

proof. Necessity is obvious. The proof of sufficiency begins by verify- 
ing the identity 

(1) (Ax, y ) + (Ay, x) = (A(x + y), ( x + y)) - (Ax, x) - (Ay, y). 

(Expand the first term on the right side.) Since A is self-adjoint, the left 
side of this equation is equal to 2 Re {Ax, y). The assumed condition im- 
plies that the right side vanishes, and hence that Re {Ax, y) = 0. At this 
point it is necessary to split the proof into two cases. If the inner product 
space is real (that is, A is symmetric), then {Ax, y) is real, and therefore 
{Ax, y) — 0. If the inner product space is complex (that is, A is Hermi- 
tian), then we find a complex number 6 such that |0| — 1 and 8 {Ax, y) — 

| {Ax, y) | . (Here x and y are temporarily fixed.) The result we already 
have, applied to 6x in place of x, yields 0 = Re ( A{8x ), y) = Re 8{Ax, y) 
= Re | {Ax, y) | = | {Ax, y) | . In either case, therefore, {Ax, y) = 0 for all 
x and y, and the desired result follows from Theorem 1. 

It is useful to ask how important is the self-adjointness of A in Theorem 
2; the answer is that in the complex case it is not important at all. 

Theorem 3. A necessary and sufficient condition that a linear transforma- 
tion A on a unitary space be 0 is that {Ax, x) = 0 for all x. 

proof. As before, necessity is obvious. For the proof of sufficiency we 
use the so-called polarization identity : 

(2) ap(Ax, y) + a/3(Ay, x) 

= (A(ax + Py), (ax + Py)) ~ \a\ 2 (Ax, x) - \P\ 2 (Ay, y). 

(Just as for (1), the proof consists of expanding the first term on the right.) 
If (Ax, x) is identically zero, then we obtain, first choosing a = 0 = 1, 
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and then a — i (= \/— 1 ), 0 => 1 

{Ax, y) + {Ay, x) = 0 
i{Ax, y) - i{Ay, x) = 0. 

Dividing the second of these two equations by i and then forming their 
arithmetic mean, we see that {Ax, y) = 0 for all x and y, so that, by The- 
orem 1 , A = 0. 

This process of polarization is often used to get information about the 
“bilinear form” {Ax, y) when only knowledge of the “quadratic form” 
{Ax, x) is assumed. 

It is important to observe that, despite its seeming innocence, Theorem 3 
makes very essential use of the complex number system; it and many of its 
consequences fail to be true for real inner product spaces. The proof, of 
course, breaks down at our choice of a = y/ — 1. For an example consider 
a 90° rotation of the plane; it clearly has the property that it sends every 
vector x into a vector orthogonal to x. 

We have seen that Hermitian transformations play the same role as real 
numbers; the following theorem indicates that they are tied up with the 
concept of reality in deeper ways than through the formal analogy that 
suggested their definition. 

Theorem 4. A necessary and sufficient condition that a linear transforma- 
tion A on a unitary space be Hermitian is that {Ax, x) be real for all x. 

proof. If A « A*, then 

{Ax, x) = {x, A*x) = {x, Ax) ■■ {Ax, x), 

so that {Ax, x) is equal to its own conjugate and is therefore real. If, con- 
versely, {Ax, x) is always real, then 

{Ax, x) = {Ax, x) — (x, A*x) = (A*x, x), 

so that {[A — A*]x, x) — 0 for all x, and, by Theorem 3 , A — A*. 

Theorem 4 is false for real inner product spaces. This is to be expected, 
for, in the first place, its proof depends on a theorem that is true for unitary 
spaces only, and, in the second place, in a real space the reality of {Ax, x) 
is automatic, whereas the identity {Ax, y) = {x, Ay) is not necessarily 
satisfied. 

§ 72. Positive transformations 

When is a complex number f positive (that is, ^ 0) ? Two equally natural 
necessary and sufficient conditions are that f may be written in the form 
f = f 2 with some real f , or that f may be written in the form f « with 
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some <r (in general complex). Remembering also the fact that (at least for 
unitary spaces) the Hermitian character of a transformation A can be 
described in terms of the inner products (Ax, x), we may consider any one 
of the three conditions below and attempt to use it as the definition of posi- 
tiveness for transformations: 

(1) A = B 2 for some self-adjoint B , 

(2) A = C*C for some C, 

(3) A is self-adjoint and (Ax, x) ^ 0 for all x. 

Before deciding which one of these three conditions to use as definition, we 
observe that (1) => (2) => (3). Indeed: if A = B 2 and B = B*, then A 
= BB = B*B, and if A - C*C, then A* = C*C = A and (Ax, x) = 
(C*Cx, x) — (Cs, Co;) = || Cx || 2 ^ 0. It is actually true that (3) implies 
(1), so that the three conditions are equivalent, but we shall not be able to 
prove this until later. We adopt as our definition the third condition. 

Definition. A linear transformation A on an inner product space is 
'positive, in symbols A ^ 0, if it is self-adjoint and if (Ax, x) ^ 0 for all x. 

More generally, we shall write A ^ B (or B ^ A) whenever A — B 0. 
Although, of course, it is quite possible that the difference of two trans- 
formations that are not even self-adjoint turns out to be positive, we shall 
generally write inequalities for self-adjoint transformations only. Observe 
that for a complex inner product space a part of the definition of positive- 
ness is superfluous; if (Ax, x) ^ 0 for all x, then, in particular, (Ax, x) is 
real for all x, and, by Theorem 4 of the preceding section, A must be 
positive. 

Positive transformations are usually called non-negative semide finite. If 
A ^ 0 and (Ax, x) = 0 implies that x = 0, we shall say that A is strictly 
positive; the usual term is positive definite. Since the Schwarz inequality 
implies that 

I (Az, z) | ^ || Ax HI a; ||, 

we see that if A is a strictly positive transformation and if Ax = 0, then 
x = 0, so that, on a finite-dimensional inner product space, a strictly posi- 
tive transformation is invertible. We shall see later that the converse is 
true; if A ^ 0 and A is invertible, then A is strictly positive. It is some- 
times convenient to indicate the fact that a transformation A is strictly 
positive by writing A > 0; if A — B > 0, we may also write A > B (or 
B < A). 

It is possible to give a matricial characterization of positive transforma- 
tions; we shall postpone this discussion till later. In the meantime we 
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shall have occasion to refer to positive matrices, meaning thereby Hermi- 
tian symmetric matrices (a t -y) (that is, a# = afi) with the property that 
for every sequence (£ 1 , • • •, £ n ) of n scalars we have S 0. 

(In the real case the bars may be omitted; in the complex case Hermitian 
symmetry follows from the other condition.) These conditions are clearly 
equivalent to the condition that (a#) be the matrix, with respect to some 
orthonormal coordinate system, of a positive transformation. 

The algebraic rules for combining positive transformations are similar 
to those for self-adjoint transformations as far as sums, scalar multiples, 
and inverses are concerned; even § 70, Theorem 2, remains valid if we re- 
place “self-adjoint” by “positive” throughout. It is also true that if A 
and B are positive, then a necessary and sufficient condition that AB (or 
BA) be positive is that AB = BA (that is, that A and B commute), but 
we shall have to postpone the proof of this statement for a while. 


EXERCISES 


1. Under what conditions on a linear transformation A does the function of 
two variables, whose value at x and y is ( Ax , y), satisfy the conditions on an inner 
product? 


2. Which of the following matrices are positive? 

«*> C IY 


<»>(!!;)■ 
» o)' 


(•) 


(o I IY 


» U !)• 

3. For which values of a is the matrix 


positive? 


f if) 


4. (a) If A is self-adjoint, then tr A is real. 

(b) If A ^ 0, then tr A ^ 0. 

5. (a) Give an example of a positive matrix some of whose entries are negative, 
(b) Give an example of a non-positive matrix all of whose entries are positive. 

6. A necessary and sufficient condition that a two-by-two matrix ^ (con- 
sidered as a linear transformation on <3 2 ) be positive is that it be Hermitian sym- 
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metric (that is, that a and 5 be real and y = ft) and that a> 0, 8 ^ 0, and ot8 — fty 
^ 0. 

7. Associated with each sequence (xi, • • •, x*) of k vectors in an inner product 
space there is a k-by-h matrix (not a linear transformation) called the Gramxan of 
(xi, •••, a;*) and denoted by G(x h - * *, a:*); the element in the i-th row and j-th 
column of G(x i, • * •, x*) is the inner product (a:,, x,). Prove that every Gramian is 
a positive matrix. 

8. If x and y are non-zero vectors (in a finite-dimensional inner product space), 
then a necessary and sufficient condition that there exist a positive transformation 
A such that Ax — y is that (x, y) > 0. 


9. (a) If the matrices A = ^ ^ and B = ^ ^ are considered as linear 

transformations on &, and if C is a Hermitian matrix (linear transformation on 
C 2 ) such that A S C and BSC, then 

°-er 


where e and 8 are positive real numbers and | 6 | 2 S min {e(l + 5), 5(1 + e) }. 

(b) If, moreover, C S 1, then t = 5 « 6 = 0. In modern terminology these 
facts together show that Hermitian matrices with the ordering induced by the no- 
tion of positiveness do not form a lattice. In the real case, if the matrix ^ ^ is 


interpreted as the point {a, ft, y) in three-dimensional space, the ordering and its 
non-lattice character take on an amusing geometric aspect. 


§ 73. Isometries 

We continue with our program of investigating the analogy between 
numbers and transformations. When does a complex number f have abso- 
lute value one? Clearly a necessary and sufficient condition is that f = 
1/f ; guided by our heuristic principle, we are led to consider linear trans- 
formations U for which U* — U~ l , or, equivalently, for which UU* — 
U*U = 1. (We observe that on a finite-dimensional vector space either 
of the two conditions UU* — 1 and U*U = 1 implies the other; see § 36, 
Theorems 1 and 2.) Such transformations are called orthogonal or unitary 
according as the underlying inner product space is real or complex. We 
proceed to derive a couple of useful alternative characterizations of them. 

Theorem. The following three conditions on a linear transformation U on 

an inner product space are equivalent to each other. 

(1) U*U = 1, 

(2) (Ux, Uy) = (x, y) for all x and y, 

(3) || Ux || = || x || for all x. 
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proof. If (1) holds, then 

(Ux, Uy) = (U*Ux, y) - (x, y) 
for all x and y, and, in particular, 

\\Ux || 2 -|l* II 2 

for all x ; this proves both the implications (1) => (2) and (2) =» (3). The 
proof can be completed by showing that (3) implies (1). If (3) holds, that 
is, if ( U*Ux , x) = (x, x) for all x, then § 71, Theorem 2 is applicable to the 
(self-adjoint) transformation U*U — 1; the conclusion is that U*U = 1 
(as desired). 

Since (3) implies that 

(4) \\Ux -Uy\\ = \\x -y || 

for all x and y (the converse implication (4) => (3) is also true and trivial), 
we see that transformations of the type that the theorem deals with are 
characterized by the fact that they preserve distances. For this reason we 
shall call such a transformation an isometry . Since, as we have already 
remarked, an isometry on a finite-dimensional space is necessarily orthog- 
onal or unitary (according as the space is real or complex), use of this 
terminology will enable us to treat the real and the complex cases simulta- 
neously. We observe that (on a finite-dimensional space) an isometry is 
always invertible and that U~ l (= [/*) is an isometry along with 17. 

In any algebraic system, and in particular in general vector spaces and 
inner product spaces, it is of interest to consider the automorphisms of the 
system, that is, to consider those one-to-one mappings of the system onto 
itself that preserve all the structural relations among its elements. We 
have already seen that the automorphisms of a general vector space are 
the invertible linear transformations. In an inner product space we re- 
quire more of an automorphism, namely, that it also preserve inner prod- 
ucts (and consequently lengths and distances). The preceding theorem 
shows that this requirement is equivalent to the condition that the trans- 
formation be an isometry. (We are assuming finite-dimensionality here; 
on infinite-dimensional spaces the range of an isometry need not be the 
entire space. This unimportant sacrifice in generality is for the sake of 
terminological convenience; for infinite-dimensional spaces there is no com- 
monly used word that describes orthogonal and unitary transformations 
simultaneously.) Thus the two questions “What linear transformations 
are the analogues of complex numbers of absolute value one?” and “What 
are the most general automorphisms of a finite-dimensional inner product 
space?” have the same answer: isometries. In the next section we shall 
show that isometries also furnish the answer to a third important question. 
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§ 74. Change of orthonormal basis 

We have seen that the theory of the passage from one linear basis of a 
vector space to another is best studied by means of an associated linear 
transformation A (§§46, 47); the question arises as to what special proper- 
ties A has when we pass from one orthonormal basis of an inner product 
space to another. The answer is easy. 

Theorem 1. If X = {®i, • • *, x n ) is an orthonormal basis of an n-dimen - 
sional inner product space 1), and if U is an isometry on V, then UX = 
{ Ux i, • • Ux n } is also an orthonormal basis of *0. Conversely , if U is a 

linear transformation and X is an orthonormal basis with the property that 
UX is also an orthonormal basis , then U is an isometry, 

proof. Since ( Ux { , Uxf) = (x{, xf) ~ it follows that UX is an ortho- 
normal set along with X; it is complete if X is, since (x, Uxi) = 0 for i — 
1, • • •, n implies that (U*x, Xi) = 0 and hence that U*x = x — 0. If, 
conversely, UX is a complete orthonormal set along with X, then we have 
(Ux, Uy) — (x, y) whenever x and y are in X , and it is clear that by lin- 
earity we obtain (Ux, Uy) — (x, y) for all x and y , 

We observe that the matrix («#) of an isometric transformation, with 
respect to an arbitrary orthonormal basis, satisfies the conditions 

^ aicittJcj = $ijy 

and that, conversely, any such matrix, together with an orthonormal basis, 
defines an isometry. (Proof: U*U = 1. In the real case the bars may be 
omitted.) For brevity we shall say that a matrix satisfying these condi- 
tions is an isometric matrix. 

An interesting and easy consequence of our considerations concerning 
isometries is the following corollary of § 56, Theorem 1. 

Theorem 2. If A is a linear transformation on a complex n-dimensional 
inner product space *U, then there exists an orthonormal basis 0C in *0 such 
that the matrix [A; 9C] is triangular , or equivalently , if [A] is a matrix , then 
there exists an isometric matrix [U] such that [U]~ l [A][U] is triangular, 

proof. In § 56, in the derivation of Theorem 2 from Theorem 1, we 
constructed a (linear) basis X = [x\, * • •, x n ) with the property that X\, 
* • •, Xj lie in 9 flly and span 91 Uy for j *» 1, • • *, n, and we showed that with 
respect to this basis the matrix of A is triangular. If we knew that this 
basis is also an orthonormal basis, we could apply Theorem 1 of the present 
section to obtain the desired result. If X is not an orthonormal basis, it is 
easy to make it into one; this is precisely what the Gram-Schmidt orthog- 
onalization process (§ 65) can do. Here we use a special property of the 
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Gram-Schmidt process, namely, that the j- th element of the orthonormal 
basis it constructs is a linear combination of x\, • • *, xj and lies therefore 
in 9TCy. 


EXEBCISES 

1. If (. Ax)(t ) = x(—t) on (P (with the inner product given by (x, y) ~ i x(t)y(t) dt) 
is the linear transformation A isometric? Is it self-adjoint? ” 0 


2. For which values of a are the following matrices isometric? 

<*> G ?)• <b > (4 I)- 


3. Find a 3-by-3 isometric matrix whose first row is a multiple of (1, 1, 1). 

4. If a linear transformation has any two of the properties of being self-adjoint, 
isometric, or involutory, then it has the third. (Recall that an involution is a 
linear transformation A such that A 2 = 1.) 


5. If an isometric matrix is triangular, then it is diagonal. 


6. If (x\ t • • •, x k ) and (yx, • • ♦, y k ) are two sequences of vectors in the same inner 
product space, then a necessary and sufficient condition that there exist an isometry 
U such that Uxi = y t = 1, • • •, k, is that (x i, • • •, Xk ) and (yi, * • •, y k ) have the 
same Gramian. 


7. The mapping £ 


1+1 

£-1 


maps the imaginary axis in the complex plane once 


around the unit circle, missing the point 1; the inverse mapping (from the circle 
minus a point to the imaginary axis) is given by the same formula. The transforma- 
tion analogues of these geometric facts are as follows. 

(a) If A is skew, then A — 1 is invertible. 

(b) If U = (A + 1)(A - l)-\ then U is isometric. (Hint: || (A + l)y || 2 
= || (A — 1 )y || 2 for every y.) 

(c) U — 1 is invertible. 

(d) If U is isometric and U — 1 is invertible, and if A = (U + l)(f7 — 1)“\ 
then A is skew. 

Each of A and U is known as the Cayley transform of the other. 


8. Suppose that U is a transformation (not assumed to be linear) that maps an 
inner product space V onto itself (that is, if x is in 1), then Ux is in *0, and if y is 
in 1), then y = Ux for some x in i)), in such a way that (Ux } Uy) — (x, y) for all 
x and y . 

(a) Prove that U is one-to-one and that if the inverse transformation is denoted 
by U~ l , then (£/“+, U~hf) ~ (z, y) and (Ux, y) ~ ( x , U~ l y) for all x and y . 

(b) Prove that U is linear. (Hint: (x, U~ l y) depends linearly on x.) 

9. A conjugation is a transformation J (not assumed to be linear) that maps a 
unitary space onto itself and is such that J 2 = 1 and (Jx, Jy) = (y, x) for all x and y. 

(a) Give an example of a conjugation. 

(b) Prove that (Jx, y) = (Jy, x). 

(c) Prove that J(x -f- y) — Jx + Jy. 

(d) Prove that J(ax) = a-Jx. 
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10. A linear transformation A is said to be real with respect to a conjugation 

J ii AJ = JA. 

(a) Give an example of a Hermitian transformation that is not real, and give 
an example of a real transformation that is not Hermitian. 

(b) If A is real, then the spectrum of A is symmetric about the real axis. 

(c) If A is real, then so is A*. 

11. § 74, Theorem 2 shows that the triangular form can be achieved by an 
orthonormal basis; is the same thing true for the Jordan form? 

12. If tr A — 0, then there exists an isometric matrix U such that all the diagonal 
entries of [U]~ l [A][U] are zero. (Hint: see § 56, Ex. 6.) 

§ 75. Perpendicular projections 

We are now in a position to fulfill our earlier promise to investigate the 
projections associated with the particular direct sum decompositions *0 = 
9TI ® 2fH x . We shall call such a projection a perpendicular projection. 
Since 911 x is uniquely determined by the subspace 9TI, we need not specify 
both the direct summands associated with a projection if we already know 
that it is perpendicular. We shall call the (perpendicular) projection E on 
911 along 9Tl x simply the projection on 9TI and we shall write E — 

Theorem 1. A linear transformation E is a perpendicular projection if 

and only if E = E 2 = E*. Perpendicular projections are positive linear 

transformations and have the property that || Ex || ^ || x || for all x. 

proof. If E is a perpendicular projection, then § 45, Theorem 1 and the 
theorem of § 20 show (after, of course, the usual replacements, such as 9Il x 
for 9Tl° and A* for A') that E = E*. Conversely if E — E 2 = E *, then 
the idempotence of E assures us that E is the projection on (R along 91, 
where, of course, (R = 6i{E) and 91 = 91(1?) are the range and the null- 
space of E , respectively. Hence we need only show that (R and 91 are or- 
thogonal. For this purpose let x be any element of (R and y any element of 
91; the desired result follows from the relation 

(x, y) = {Ex, y) = {x, E*y) = (x, Ey) = 0. 

The positive character of an E satisfying E — E 2 = E* follows from 
(Ex, x) - {E 2 x, x) = {Ex, E*x) = {Ex, Ex) = || Ex || 2 ^ 0. 
Applying this result to the perpendicular projection 1 — E, we see that 
|| x || 2 - || Ex |j 2 = {x, x) - {Ex, x) - ([1 - E]x, x) ^ 0; 
this concludes the proof of the theorem. 
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For some of the generalizations of our theory it is useful to know that 
idempotence together with the last property mentioned in Theorem 1 is 
also characteristic of perpendicular projections. 

Theorem 2. If a linear transformation E is such that E — E 2 and 
|| Ex || ^ || x || for all x, then E = E*. 

proof. We are to show that the range <R and the null-space 31 of E are 
orthogonal. If x is in 3l x , then y = Ex — x is in 31, since Ey = E 2 x — Ex 
= Ex — Ex = 0 . Hence Ex = x + y with (x, y) — 0, so that 

II x || 2 ^ || Ex || 2 = || x || 2 + || y || 2 ^ || x || 2 , 

and therefore y = 0. Consequently Ex — x, so that x is in (R; this proves 
that 91 x C (R. Conversely, if z is in (R, so that Ez — z, we write z = x + y 
with x in 9l x and y in 91. Then z = Ez = Ex + Ey = Ex = x. (The rea- 
son for the last equality is that x is in 91 x and therefore in (R.) Hence z is 
in 9l x , so that (R C 9l x , and therefore (R = 9l x . 

We shall need also the fact that the theorem of § 42 remains true if the 
word “projection” is qualified throughout by “perpendicular.” This is an 
immediate consequence of the preceding characterization of perpendicular 
projections and of the fact that sums and differences of self-adjoint trans- 
formations are self-adjoint, whereas the product of two self-adjoint trans- 
formations is self-adjoint if and only if they commute. By our present 
geometric methods it is also quite easy to generalize the part of the theorem 
dealing with sums from two summands to any finite number. The generali- 
zation is most conveniently stated in terms of the concept of orthogonality 
for projections; we shall say that two (perpendicular) projections E and F 
are orthogonal if EF = 0. (Consideration of adjoints shows that this is 
equivalent to FE = 0.) The following theorem shows that the geometric 
language is justified. 

Theorem 3. Two perpendicular projections E — P ^ and F = P<$i are 
orthogonal if and only if the subspaces 3TI and 91 (that is, the ranges of E 
and F) are orthogonal . 

proof. If EF = 0, and if x and y are in the ranges of E and F respec- 
tively, then 

(x, y) = {Ex, Fy) = (x, E*Fy) = (x, EFy) = 0. 

If, conversely, 311 and 31 are orthogonal (so that 31 C 3H X ), then the fact 
that Ex — 0 for x in 31t x implies that EFx = 0 for all x (since Fx is in 91 
and consequently in 3R X ). 
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§ 76. Combinations of perpendicular projections 

The sum theorem for perpendicular projections is now easy. 

Theorem 1. If E u • • •, E n are {'perpendicular) projections , then a neces- 
sary and sufficient condition that E = E\ + • • • + E n be a {perpendicular) 
projection is that EiEj = 0 whenever i j {that is, that the Ei be pairwise 
orthogonal). 

proof. The proof of the sufficiency of the condition is trivial; we prove 
explicitly its necessity only, so that we now assume that E is a perpendicu- 
lar projection. If x belongs to the range of some E{, then 

|| x || 2 £ || Ex || 2 - (Ex, x) = (£jEjx,x) 

- Z; (Ejx, *) = E; II EjX || 2 £ || E<x || 2 = || x || 2 , 
so that we must have equality all along. Since, in particular, we must have 

Ei II E jX || 2 = || E t x || 2 , 

it follows that EjX = 0 whenever j i. In other words, every x in the 
range of Ei is in the null-space (and, consequently, is orthogonal to the 
range) of every Ej with j ^ i; using § 75, Theorem 3, we draw the desired 
conclusion. 

We end our discussion of projections with a brief study of order relations. 
It is tempting to write E ^ F, for two perpendicular projections E = Pya 
and F = P$i, whenever 9TC C 31. Earlier, however, we interpreted the sign 
^ , when used in an expression involving linear transformations E and F 
(as in E g F), to mean that F — E is a positive transformation. There 
are also other possible reasons for considering E to be smaller than F; we 
might have || Ex || g || Fx || for all x, or FE — EF = E (see § 42, (ii)). 
The situation is straightened out by the following theorem, which plays 
here a role similar to that of § 75, Theorem 3, that is, it establishes the coin- 
cidence of several seemingly different concepts concerning projections, some 
of which are defined algebraically while others refer to the underlying geo- 
metrical objects. 

Theorem 2. For perpendicular projections E — am: and F — P$i the fol- 
lowing conditions are mutually equivalent. 


(i) 

E £ F. 

(ii) 

|| Ex || ^ || Fx || for all x. 

(iii) 

311 C 91. 

(iva) 

II 

fed 

(ivb) 

EF = E. 
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proof. We shall prove the implication relations (i) => (ii) => (iii) => 
(iva) =» (ivb) =» (i). 

(i) => (ii). Ii E ^ F, then, for all x, 

Og([F- E]x, x ) = ( Fx , x) - (Ex, x) = || Fx || 2 - || Ex || 2 

(since E and F are perpendicular projections). 

(ii) => (iii). We assume that || Ex || ^ || Fx |j for all x. Let us now 
take any x in 3TC; then we have 

|| x || § || Fx || £ || Ex || = || * ||, 

so that || Fx || = || x ||, or (x, x) — (Fx, x) = 0, whence 

([1 — F\x, x) = || (1 — F)x || 2 - 0, 

and consequently x = Fx . In other words, x in 9TC implies that x is in 91, 
as was to be proved. 

(iii) =» (iva). If 9TC C 91, then Ex is in 91 for all x, so that, FEx = Ex 
for all x, as was to be proved. 

That (iva) implies (ivb), and is in fact equivalent to it, follows by taking 
adjoints. 

(iv) =» (i). If EF = FE = E, then, for all x, 

(Fx, x) — (Ex, x) = (Fx, x) — (FEx, x) = (F[l — E]x, x). 

Since E and F are commutative projections, so also are (1 — J57) and F, 
and consequently G = F( 1 — E) is a projection. Hence 

(Fx, x) - (Ex, x) = (Gx, x) = || Gx || 2 £ 0. 

This completes the proof of Theorem 2. 

In terms of the concepts introduced by now, it is possible to give a quite 
intuitive sounding formulation of the theorem of § 42 (in so far as it applies 
to perpendicular projections), as follows. For two perpendicular projec- 
tions E and F, their sum, product, or difference is also a perpendicular 
projection if and only if F is respectively orthogonal to, commutative with, 
or greater than E. 


EXERCISES 

1. (a) Give an example of a projection that is not a perpendicular projection. 

(b) Give an example of two projections E and F (they cannot both be per- 
pendicular) such that EF = 0 and FE ^ 0. 

2. Find the (perpendicular) projection of (1, 1, 1) on the (one-dimensional) sub- 
space of e 8 spanned by (1, —1, 1). (In other words: find the image of the given 
vector under the projection onto the given subspace.) 

3. Find the matrices of all perpendicular projections on C*. 
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4. If U = 2E — 1, then a necessary and sufficient condition that U be an in- 
volutory isometry is that E be a perpendicular projection. 

5. A linear transformation U is called a partial isometry if there exists a sub- 
space 9TI such that || Ux || — || x || whenever x is in 3TC and Ux = 0 whenever 
x is in 9TC J \ 

(a) The adjoint of a partial isometry is a partial isometry. 

(b) If U is a partial isometry and if 2flZ is a subspace such that || Ux || = || x || 
or 0 according as x is in SflZ or in SKT 1 , then U*U is the perpendicular projection on 
OR. 

(c) Each of the following four conditions is necessary and sufficient that a linear 
transformation U be a partial isometry, (i) UU*U = U, (ii) U*U is a projection, 
(hi) U*UU* = U*J (iv) UU* is a projection. 

(d) If X is a proper value of a partial isometry, then fX | ^ 1. 

(e) Give an example of a partial isometry that has \ as a proper value. 

6. Suppose that A is a linear transformation on, and ffiZ is a subspace of, a finite- 
dimensional vector space U. Prove that if dim ^ dim 97l x , then there exist 
linear transformations B and C on V such that Ax = ( BC — CB)x for all a; in 
3TL (Hint: let B be a partial isometry such that || Bx || = || x || or 0 according as 
x is in 2fR or in ffiZ 1- and such that (R (B) (Z 

§ 77. Complexification 

In the past few sections we have been treating real and complex vector 
spaces simultaneously. Sometimes this is not possible; the complex num- 
ber system is richer than the real. There are theorems that are true for 
both real and complex spaces, but for which the proof is much easier in 
the complex case, and there are theorems that are true for complex spaces 
but not for real ones. (An example of the latter kind is the assertion that 
if the space is finite-dimensional, then every linear transformation has a 
proper value.) For these reasons, it is frequently handy to be able to 
“complexify” a real vector space, that is, to associate with it a complex 
vector space with essentially the same properties. The purpose of this 
section is to describe such a process of complexification. 

Suppose that *0 is a real vector space, and let *U + be the set of all ordered 
pairs (x, y) with both x and y in *0. Define the sum of two elements of 
U + by 

<*ii 2/i > + <* 2 , 2 / 2 ) = (xx + x 2t yi + 2 / 2 ), 

and define the product of an element of by a complex number a + ip 
( 1 a and 0 real, i = V^l ) by 

(a + ip)(x, y) = (otx — fry, fix + ay). 

(To remember these formulas, pretend that (x, y) means x + iy.) A 
straightforward and only slightly laborious computation shows that the 
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set V + becomes a complex vector space with respect to these definitions 
of the linear operations. 

The set of those elements (x, y) of V + for which y = 0 is in a natural 
one-to-one correspondence with the space V . Being a complex vector 
space, the space T> + may also be regarded as a real vector space; if we 
identify each element x of V with its replica (x, 0) in *U + (it is exceedingly 
convenient to do this), we may say that V + (as a real vector space) in- 
cludes 1). Since (0, y) = i(y, 0), so that (x, y) = (x, 0) + i(y, 0), our 
identification convention enables us to say that every vector in *U + has 
the form x + iy y with x and y in *0. Since *0 and iV (where it) denotes the 
set of all elements (x, y) in t) + with x = 0) are subsets of t) + with only 
0 (that is, (0, 0)) in common, it follows that the representation of a vector 
of t) + in the form x + iy (with x and y in t)) is unique. We have thus 
constructed a complex vector space t) + with the property that *0 + con- 
sidered as a real space includes V as a subspace, and such that is the 
direct sum of V and iV. (Here iV denotes the set of all those elements 
of that have the form iy for some y in V.) We shall call D + the com- 
plexification of 1). 

If {xi, *•*, x n } is a linearly independent set in V (real coefficients), 
then it is also a linearly independent set in (complex coefficients). In- 
deed, if ax, • • •, «n, ft, • • •, fin are real numbers such that Xi («i + ifij) x i 
= 0, then otjXj) + fijXj) = 0, and consequently, by the 

uniqueness of the representation of vectors in by means of vectors in 
*0, it follows that Xi a i x i = Si fii x i — 0; the desired result is now implied 
by the assumed (real) linear independence of {xi, • • •, x n } in V. If, more- 
over, {xi, • ■ - , x n } is a basis in *0 (real coefficients), then it is also a basis 
in D -1 " (complex coefficients). Indeed, if x and y are in 1), then there exist 
real numbers a Xy •••, a n , fii, • • • , fi n such that x = Xi a i x i an( i V = 
Xi fij x j ; it follows that x + i y = Xi (<*i + ifij) x h and hence that {xi, • • •, 
x n } spans D + . These results imply that the complex vector space 'U' 1 " 
has the same dimension as the real vector space V. 

There is a natural way to extend every linear transformation A on V 
to a linear transformation A + on °U + ; we write 

A + (x + iy) = Ax + iAy 

whenever x and y are in *0. (The verification that A + is indeed a linear 
transformation on *U + is routine.) A similar extension works for linear and 
even multilinear functionals. If, for instance, w is a (real) bilinear func- 
tional on *0, its extension to is the (complex) bilinear functional defined 
by 

«> + (xi + iy u x 2 + m) 

- wfru x i) - w (yi> V 2 ) + i(p*(x u 2 / 2 ) + u>(yi, ^ 2 )). 
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If, on the other hand, w is alternating, then the same is true of w + . Indeed, 
the real and imaginary parts of w + (x + iy, x + iy) are w(x, x) — w(y , y) 
and w{x , y) + w(y, x) respectively; if w is alternating, then w is skew sym- 
metric (§ 30, Theorem I), and therefore is alternating. The same proof 
establishes the corresponding result for fc-linear functionals also, for all 
values of k. From this and from the definition of determinants it follows 
that det A = det A + for every linear transformation A on V. 

The method of extending bilinear functionals works for conjugate bi- 
linear functionals also. If, that is, V is a (real) inner product space, then 
there is a natural way of introducing a (complex) inner product into *U + ; 
we write, by definition, 

(xi + iy u x 2 + iy 2 ) = fri, x 2 ) + G/i, y 2 ) - *((*i, ^ 2 ) - (Vi, ^ 2 ))* 
Observe that if x and y are orthogonal vectors in V, then 

II $ + W II 2 = \\ x II 2 + II 2/ II 2 - 

The correspondence from A to A + preserves all algebraic properties of 
transformations. Thus if B = a A (with a real), then B+ = aA + ; if 
C = A + B, then = A + + B+; and if C = AB, then C + - 
If, moreover, *U is an inner product space, and if B = A *, then J3 + = (A + ) *. 
(Proof : evaluate (A + (xi + i^i)> ( x 2 + ^ 2 / 2 )) and (x\ + iy\, B + (x 2 + iy 2 )).) 

If A is a linear transformation on V and if A + has a proper vector 
x + iy, with proper value a + ip (where x and y are in V and a and fi are 
real), so that 

Ax = ax — fiy, 


Ay = px + ay, 

then the subspace of *0 spanned by x and y is invariant under A. (Since 
every linear transformation on a complex vector space has a proper vector, 
we conclude that every linear transformation on a real vector space leaves 
invariant a subspace of dimension equal to 1 or 2.) If, in particular, A + 
happens to have a real proper value (that is, if fi — 0), then A has the same 
proper value (since Ax — ax, Ay = ay, and not both x and y can vanish). 

We have already seen that every (real) basis in *0 is at the same time 
a (complex) basis in It follows that the matrix of a linear transforma- 
tion A on *0, with respect to some basis 3C in *U, is the same as the matrix 
of A + on 0)+, with respect to the basis 3C in *U + . This comment is at the 
root of the whole theory of complexifi cation; the naive point of view on the 
matter is that real matrices constitute a special case of complex matrices. 
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EXERCISES 

1. What happens if the process of complexification described in § 77 is applied 
to a vector space that is already complex? 

2. Prove that there exists a unique isomorphism between the complexification 
described in § 77 and the one described in § 25, Ex. 5 with the property that each 
“real” vector (that is, each vector in the originally given real vector space) cor- 
responds to itself. 

3. (a) What is the complexification of (ft 1 ? 

(b) If V is an n-dimensional real vector space, what is the dimension of its 
complexification T) + , regarded as a real vector space? 

4. Suppose that T) + is the complex inner product space obtained by complexifying 
a real inner product space V. 

(a) Prove that if V + is regarded as a real vector space and if A(x + %y) = x — \y 
whenever x and y are in *U, then A is a linear transformation on *U + . 

(b) Is A self-adjoint? Isometric? Idempotent? Involutory? 

(c) What if U + is regarded as a complex space? 

5. Discuss the relation between duality and complexification, and, in particular, 
the relation between the adjoint of a linear transformation on a real vector space 
and the adjoint of its complexification. 

6. If A is a linear transformation on a real vector space V and if a subspace 3TI 
of the complexification + is invariant under A + , then 311 x fl V is invariant under 
A. 


§ 78, Characterization of spectra 

The following results support the analogy between numbers and trans- 
formations more than anything so far; they assert that the properties that 
caused us to define the special classes of transformations we have been 
considering are reflected by their spectra. 

Theorem 1 . If A is a self-adjoint transformation on an inner product 
space , then every proper value of A is real; if A is positive , or strictly positive , 
then every proper value of A is positive , or strictly positive f respectively . 

proof. We may ignore the fact that the first assertion is trivial in the 
real case; the same proof serves to establish both assertions in both the 
real and the complex case. Indeed, if Ax — Xx, with x ^ 0, then, 

(Ax, x ) _ \(x, x) 

II* II 2 II* II 2 1 

it follows that if (Ax, x) is real (see § 71, Theorem 4), then so is X, and if 
(.4s. x) is positive (or strictly positive) then so is X. 
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Theorem 2. Every root of the characteristic equation of a self-adjoint 
transformation on a finite-dimensional inner product space is real . 

proof. In the complex case roots of the characteristic equation are the 
same thing as proper values, and the result follows from Theorem 1. 
If A is a symmetric transformation on a Euclidean space, then its com- 
plexification is Hermitian, and the result follows from the fact that 
A and A + have the same characteristic equation. 

We observe that it is an immediate consequence of Theorem 2 that a 
self-adjoint transformation on a finite-dimensional inner product space 
always has a proper value. 

Theorem 3. Every proper value of an isometry has absolute value one . 

proof. If U is an isometry, and if Ux = Xx, with then || x || 

= II Ux\\ = | X | * || a; || . 

Theorem 4. If A is either self-adjoint or isometric , then proper vectors 
of A belonging to distinct proper values are orthogonal . 

proof. Suppose Axi = XiXi, Ax 2 — X 2 x 2 , hi ** X 2 . If A is self-adjoint, 
then 

(1) Xx(x!, x 2 ) = (Axi, x 2 ) = {x u Ax 2 ] ) = X 2 (xi, x 2 ). 

(The middle step makes use of the self-adjoint character of A , and the last 
step of the reality of X 2 .) In case A is an isometry, (1) is replaced by 

(2) (x h x 2 ) = (Ax i, Ax 2 ) = (Xi/X 2 )(xi, x 2 ); 

recall that X 2 = 1/X 2 . In either case (x u x 2 ) 0 would imply that Xi 

= X 2 , so that we must have (xi, x 2 ) — 0. 

Theorem 5. If a subspace SfTC is invariant under an isometry U on a 
finite-dimensional inner product space , then so is SfTl" L . 

proof. Considered on the finite-dimensional subspace SKI, the trans- 
formation U is still an isometry, and, consequently, it is invertible. It 
follows that every x in ( 3K may be written in the form x — Uy with y 
in 9TC; in other words, if x is in 911 and if y — J7“ x x, then y is in am. Hence 
3TC is invariant under U = t/*. It follows from § 45, Theorem 2, that 
3flt x is invariant under ([/*)* — U. 

We observe that the same result for self-adjoint transformations (even 
in not necessarily finite-dimensional spaces) is trivial, since if 311 is invariant 
under A, then 3Tl x is invariant under A* = A. 

Theorem 6. If A is a self-adjoint transformation on a finite-dimensional 
inner produet space , then the algebraic multiplicity of each proper value 
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Xo of A is equal to its geometric multiplicity , that is, to the dimension of the 
subspace 9TI of all solutions of Ax = X 0 x. 

proof. It is clear that Sfll is invariant under A, and therefore so is 9fR _L ; 
let us denote by B and C the linear transformation A considered only on 
3TC and respectively. We have 

det (A - X) = det (B - X)-det (C - X) 

for all X. Since B is a self-adjoint transformation on a finite-dimensional 
space, with only one proper value, namely, Xo, it follows that Xo must 
occur as a proper value of B with algebraic multiplicity equal to the di- 
mension of M. If that dimension is m, then det (B — X) = (X 0 — X) m . 
Since, on the other hand, X 0 is not a proper value of C at all, and since, 
consequently, det ( C — X 0 ) 0, we see that det (A — X) contains (X 0 — X) 

as a factor exactly m times, as was to be proved. 

What made this proof work was the invariance of SHI* 1 and the fact 
that every root of the characteristic equation of A is a proper value of A. 
The latter assertion is true for every linear transformation on a unitary 
space; the following result is a consequence of these observations and of 
Theorem 5. 

Theorem 7. If U is a unitary transformation on a finite-dimensional 
unitary space, then the algebraic multiplicity of each proper value of U is 
equal to its geometric multiplicity. 


EXERCISES 

1. Give an example of a linear transformation with two non-orthogonal proper 
vectors belonging to distinct proper values. 

2. Give an example of a non-positive linear transformation (on a finite-di- 
mensional unitary space) all of whose proper values are positive. 

3. (a) If A is self-adjoint, then det A is real. 

(b) If A is unitary, then | det A | = 1. 

(c) What can be said about the determinant of a partial isometry? 


§ 79. Spectral theorem 

We are now ready to prove the main theorem of this book, the theorem 
of which many of the other results of this chapter are immediate corollaries. 
To some extent what we have been doing up to now was a matter of 
sport (useful, however, for generalizations) ; we wanted to show how much 
can conveniently be done with spectral theory before proving the spectral 
theorem. In the complex case, incidentally, the spectral theorem can be 
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made to follow from the triangularization process we have already described; 
because of the importance of the theorem we prefer to give below its 
(quite easy) direct proof. The reader may find it profitable to adapt the 
method of proof (not the result) of § 56, Theorem 2, to prove as much as 
he can of the spectral theorem and its consequences. 

Theorem 1. To every self-adjoint linear transformation A on a finite - 
dimensional inner product space there correspond real numbers a\, • • • , a r 
and perpendicular projections E u •• •, E r ( where r is a strictly positive 
integer , not greater than the dimension of the space) so that 

(1) the a, are pairwise distinct , 

(2) the Ej are pairwise orthogonal and different from 0, 

( 3 ) - 1 , 

(4) otjEj = A. 

proof. Let at, • • * , a r be the distinct proper values of A, and let Ej 
be the perpendicular projection on the subspace consisting of all solutions 
of Ax = ajX (j » 1, • • •, r). Condition (1) is then satisfied by definition; 
the fact that the a’s are real follows from § 78, Theorem 1. Condition (2) 
follows from § 78, Theorem 4. From the orthogonality of the Ej we infer 
that if E = Ej , then E is a perpendicular projection. The dimension 

of the range of E is the sum of the dimensions of the ranges of the Ej, 

and consequently, by § 78, Theorem 6, the dimension of the range of E 
is equal to the dimension of the entire space; this implies (3). (Alterna- 
tively, if E j* 1, then A considered on the range of 1 — E would be a self- 
adjoint transformation with no proper values.) To prove (4), take any 
vector x and write Xj — Ejx; it follows that Axj = aptj and hence that 

Ax = A (2 j Ejx) = 2, • Axj = a i x i = Hi 

This completes the proof of the spectral theorem. 

The representation A = otjEj (where the a’s and the E * s satisfy the 
conditions (1)— (3) of Theorem 1) is called a spectral form of A ; the main 
effect of the following result is to prove the uniqueness of the spectral form. 

Theorem 2. ctjEj is the spectral form of a self-adjoint transforma- 

tion A on a finite-dimensional inner product space y then the a’s are all the 
distinct proper values of A . //, moreover , 1 ^ k ^ r, then there exist 
polynomials pk, with real coefficients , such that Pk(<*j) = 0 whenever j ^ k 
and such that pk(<*k) = 1 ; for every such polynomial p k (A) = E k . 

proof. Since Ej 0, there exists a vector x in the range of Ej. Since 
EjX = x and E& = 0 whenever i j t it follows that 

Ax = oiiEiX = ajEjX = ajX, 
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so that each aj is a proper value of A. If, conversely, X is any proper value 
of A, say Ax = \x with x ^ 0, then we write x, =» EjX and we see that 


and 


Ax — Xx X Xj 
Ax - A x i = Hi «i*i, 


so that = 0. Since the Xj are pairwise orthogonal, those 

among them that are not zero form a linearly independent set. It follows 
that, for each j y either Xj = 0 or else X = a ; *. Since x ^ 0, we must have 
Xj 7 * 0 for some j , and consequently X is indeed equal to one of the a’s. 
Since E{Ej = 0 if i ^ j, and Ej 2 = it follows that 

A 2 - (L<a#t)(Za«&i) = 

= 2 ; a j 2 Ej- 

Similarly 

A" = Si«i n ^i 


for every positive integer n (in case n = 0, use (3)), and hence 

p(A) = ZiP(«i)®i 


for every polynomial p. To conclude the proof of the theorem, all we need 
to do is to exhibit a (real) polynomial p* such that pt(a>) = 0 whenever 
j t* k and such that p^a*) - 1. If we write 


_ t — ctj 

Pk(t ) = Da* * 

«* — ay 

then pi is a polynomial with all the required properties. 

Theorem 3. If X;-i a iEj is the spectral form of a self-adjoint trans- 
formation A on a finite-dimensional inner product space , then a necessary 
and sufficient condition that a linear transformation B commute with A 
is that it commute with each Ej . 

proof. The sufficiency of the condition is trivial; if A — ajEj and 
EjB = BEj for all j, then AB = BA. Necessity follows from Theorem 2; 
if B commutes with A, then B commutes with every polynomial in A, and 
therefore B commutes with each Ej. 

Before exploiting the spectral theorem any further, we remark on its 
matricial interpretation. If we choose an orthonormal basis in the range 
of each Ej } then the totality of the vectors in these little bases is a basis 
for the whole space; expressed in this basis the matrix of A will be diagonal. 
The fact that by a suitable choice of an orthonormal basis the matrix 
of a self-adjoint transformation can be made diagonal, or, equivalently, 



158 


ORTHOGONALITY 


Sec. 79 


that any self-adjoint matrix can be isometrically transformed (that is, 
replaced by where U is an isometry) into a diagonal matrix, 

already follows (in the complex case) from the theory of the triangular 
form. We gave the algebraic version for two reasons. First, it is this 
version that generalizes easily to the infinite-dimensional case, and, 
second, even in the finite-dimensional case, writing often has 

great notational and typographical advantages over the matrix notation. 

We shall make use of the fact that a not necessarily self-adjoint trans- 
formation A is isometrically diagonable (that is, that its matrix with respect 
to a suitable orthonormal basis is diagonal) if and only if conditions (1)— (4) 
of Theorem 1 hold for it. Indeed, if we have (1)— (4), then the proof of 
diagonability, given for self-adjoint transformations, applies; the converse 
we leave as an exercise for the reader. 


EXERCISES 

1. Suppose that A is a linear transformation on a complex inner product space. 
Prove that if A is Hermitian, then the linear factors of the minimal polynomial of 
A are distinct. Is the converse true? 

2. (a) Two linear transformations A and B on a unitary space are unitarily 
equivalent if there exists a unitary transformation U such that A = U^BU. 
(The corresponding concept in the real case is called orthogonal equivalence.) Prove 
that unitary equivalence is an equivalence relation. 

(b) Are A*A and A A* always unitarily equivalent? 

(c) Are A and A* always unitarily equivalent? 

3. Which of the following pairs of matrices are unitarily equivalent? 

» c i) - c »)• 

( 0 0 1 \ \ 0 \ 

0 0 0] and H | 0 ]• 

10 0/ \0 0 -1/ 

/ 0 1 0\ /-I 0 0\ 

(c) ( — 1 0 0] and ( 0 * 0j- 

\ 0 0 -1/ V 0 0 if 

/ 0 1 0 \ /0 1 0 \ 

(d) (—10 0} and (l 0 Oj- 

V 0 0 -1/ \0 0 1/ 

4. If two linear transformations are unitarily equivalent, then they are similar, 
and they are congruent; if two linear transformations are either similar or con- 
gruent, then they are equivalent. Show by examples that these implication rela- 
tions are the only ones that hold among these concepts. 
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§ 80 . Normal transformations 

The easiest (and at the same time the most useful) generalizations of 
the spectral theorem apply to complex inner product spaces (that is, 
unitary spaces). In order to avoid irrelevant complications, in this section 
we exclude the real case and concentrate attention on unitary spaces 
only. 

We have seen that every Hermitian transformation is diagonable, and 
that an arbitrary transformation A may be written in the form B + iC, 
with B and C Hermitian; why isn't it true that simply by diagonalizing 
B and C separately we can diagonalize A*t The answer is, of course, that 
diagonalization involves the choice of a suitable orthonormal basis, and 
there is no reason to expect that a basis that diagonalizes B will have the 
same effect on C. It is of considerable importance to know the precise 
class of transformations for which the spectral theorem is valid, and 
fortunately this class is easy to describe. 

We shall call a linear transformation A normal if it commutes with its 
adjoint, A* A = A A*. (This definition makes sense, and is used, in both 
real and complex inner product spaces; we shall, however, continue to use 
techniques that are inextricably tied up with the complex case.) We 
point out first that A is normal if and only if its real and imaginary parts 
commute. Suppose, indeed, that A is normal and that A — B + iC with 

B and C Hermitian; since B = %(A + A*) and C — ^ (A — A *), it is 

2>i 

clear that BC — CB . If, conversely, BC = CB , then the two relations 
A — B + iC and 4* = B — iC imply that A is normal. We observe that 
Hermitian and unitary transformations are normal. 

The class of transformations possessing a spectral form in the sense of 
§ 79 is precisely the class of normal transformations. Half of this statement 
is easy to prove: if A = a } Ej, then A* = ^jEj, and it takes merely 
a simple computation to show that A* A — A A* = 1 a j\ 2 Ej- To prove 

the converse, that is, to prove that normality implies the existence of a 
spectral form, we have two alternatives. We could derive this result from 
the spectral theorem for Hermitian transformations, using the real and 
imaginary parts, or we could prove that the essential lemmas of § 78, on 
which the proof of the Hermitian case rests, are just as valid for an arbitrary 
normal transformation. Because its methods are of some interest, we 
adopt the second procedure. We observe that the machinery needed to 
prove the lemmas that follow was available to us in § 78, so that we 
could have stated the spectral theorem for normal transformations im- 
mediately; the main reason we traveled the present course was to motivate 
the definition of normality. 
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Theorem 1. If A is normal , then a necessary and sufficient condition that 
x be a proper vector of A is that it be a proper vector of A*; if Ax — Xx, 
then A*x = Xx. 

proof. We observe that the normality of A implies that 
|| Ax || 2 — {Ax, Ax) = {A* Ax, x) = {AA*x, x) 

= (A*x, A*x) = || A*x || 2 . 

Since A — X is normal along with A, and since {A — X)* — A* — X, we 
obtain the relation 

|| Ax — Xx || = || A*x — Xx ||, 

from which the assertions of the theorem follow immediately. 

Theorem 2. If A is normal , then proper vectors belonging to distinct 
proper values are orthogonal. 

proof. If Ax\ = Xi^i and Ax 2 = X 2 x 2 , then 

Xitel, x 2 ) = {Ax u x 2 ) = (xi, A*x 2 ) «= X 2 tei, x 2 ). 

This theorem generalizes § 78, Theorem 4; in the proof of the spectral 
theorem for Hermitian transformations we needed also § 78, Theorems 
5 and 6. The following result takes the place of the first of these. 

Theorem 3. If A is normal, X is a proper value of A, and DT l is the set 
of all solutions of Ax = Xx, then both DTC and 3n ± are invariant under A . 

proof. The fact that 3flX is invariant under A we have seen before; 
this has nothing to do with normality. To prove that DTC x is also invariant 
under A, it is sufficient to prove that DTI is invariant under A *. This is 
easy; if x is in DTI, then 

A(A*x) ® A*(Ax) = X(A*x), 
so that A*x is also in DTI. 

This theorem is much weaker than its correspondent in § 78. The im- 
portant thing to observe, however, is that the proof of § 78, Theorem 6, 
depended only on the correspondingly weakened version of Theorem 5; 
the only subspaces that need to be considered are the ones of the type 
mentioned in the preceding theorem. 

This concludes the spade work; the spectral theorem for normal operators 
follows just as before in the Hermitian case. If in the theorems of § 79 
we replace the word “self-adjoint” by “normal,” delete all references to 
reality, and insist that the underlying inner product space be complex, 
the remaining parts of the statements and all the proofs remain unchanged. 
It is the theory of normal transformations that is of chief interest in the 
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study of unitary spaces. One of the most useful facts about normal trans- 
formations is that spectral conditions of the type given in § 78, Theorems 
1 and 3, there shown to be necessary for the self-adjoint, positive, and 
isometric character of a transformation, are in the normal case also suf- 
ficient. 

Theorem 4. A normal transformation on a finite-dimensional unitary 
space is (1) Hermitian , (2) positive , (3) strictly positive , (4) unitary , (5) 
invertible , (6) idempotent if and only if all its proper values are (1') real y 
(2') positive y (3') strictly positive , (4') of absolute value one y (5') different 
from zero , (6') equal to zero or one . 

proof. The fact that (1), (2), (3), and (4) imply (T), (2'), (3'), and 
(4'), respectively, follows from § 78. If A is invertible and Ax — \x y 
with x^O, then x = A~ l Ax = XA~ l x y and therefore X ^ 0; this proves 
that (5) implies (5'). If A is idempotent and Ax = \x, with x^O, then 
\x = Ax = A 2 x = \ 2 x y so that (X — \ 2 )x = 0 and therefore X = X 2 ; this 
proves that (6) implies (6'). Observe that these proofs are valid for an 
arbitrary inner product space (not even necessarily finite-dimensional) 
and that the auxiliary assumption that A is normal is also superfluous. 

Suppose now that the spectral form of A is Tw ajEj. Since A* =» 
J^y 0 LjEj y we see that (T) implies (1). Since 

(Ax, x) = Si otjiEjx, x) = otj || EjX || 2 , 

it follows that (2') implies (2). If ay > 0 for all j and if (Ax y x) = 0, 
then we must have EjX — 0 for all j, and therefore x = EjX = 0 ; 
this proves that (3') implies (3). The implication from (4') to (4) follows 
from the relation _ 

A*A - 

If ctj J* 0 for all j, we may form the linear transformation B = ^j—Ej] 

since AB = BA = 1, it follows that (5') implies (5). Finally A 2 = 
ct 2 Ej\ from this we infer that (6') implies (6). 

We observe that the implication relations (5) =» (5'), (2) => (2'), and 
(3') =* (3) together fulfill a promise we made in § 72; if A is positive and 
invertible, then A is strictly positive. 


EXERCISES 

1. Give an example of a normal transformation that is neither Hermitian nor 
unitary. 

2. (a) If A is an arbitrary linear transformation (on a finite-dimensional unitary 
space), and if a and & are complex numbers such that |a| = |/3| = 1, then otA + 
6A* is normal. 
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(b) If || Ax || = || A*x || for all x, then A is normal. 

(c) Is the sum of two normal transformations always normal? 

3. If A is a normal transformation on a finite-dimensional unitary space and if 
3TI is a subspace invariant under A , then the restriction of A to 9H is also normal. 

4. A linear transformation A on a finite-dimensional unitary space *0 is normal 
if and only if A9R C 9TC implies A9fH x C for every subspace 3TI of D. 

5. (a) If A is normal and idempotent, then it is self-adjoint. 

(b) If A is normal and nilpotent, then it is zero. 

(c) If A is normal and A* = A 2 , then A is idempotent. Does the conclusion 
remain true if the assumption of normality is omitted? 

(d) If A is self-adjoint and if A* = 1 for some strictly positive integer k, then 
A 2 = 1. 

6. If A and B are normal and if AB = 0, does it follow that BA — 0? 

7. Suppose that A is a linear transformation on an n-dimensional unitary space; 
let Xi, • X» be the proper values of A (each occurring a number of times equal to 
its algebraic multiplicity). Prove that 

SilXil 2 ^tr(A*A), 
and that A is normal if and only if equality holds. 

8. The numerical range of a linear transformation A on a finite-dimensional 
unitary space is the set W(A) of all complex numbers of the form (Ax, x), with 

ii * ii = i- 

(a) If A is normal, then W(A) is convex. (This means that if £ and rj are in 
W(A) and if 0 ^ a ^ 1, then ac£ + (1 — a)rj is also in W(A).) 

(b) If A is normal, then every extreme point of W(A) is a proper value of A. 
(An extreme point is one that does not have the form a£ + (1 — a)-q for any £ 
and rj in W(A) and for any a properly between 0 and 1.) 

(c) It is known that the conclusion of (a) remains true even if normality is not 
assumed. This fact can be phrased as follows: if Ai and A 2 are Hermitian trans- 
formations, then the set of all points of the form ((Aix, x), (A 2 x, x)) in the real 
coordinate plane (with || x || — 1) is convex. Show that the generalization of this 
assertion to more than two Hermitian transformations is false. 

(d) Prove that the conclusion of (b) may be false for non-normal transformations. 


§ 81. Orthogonal transformations 

Since a unitary transformation on a unitary space is normal, the results 
of the preceding section include the theory of unitary transformations as 
a special case. Since, however, an orthogonal transformation on a real 
inner product space need not have any proper values, the spectral theorem, 
as we know it so far, gives us no information about orthogonal transforma- 
tions. It is not difficult to get at the facts; the theory of complexification 
was made to order for this purpose. 

Suppose that U is an orthogonal transformation on a finite-dimensional 
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real inner product space V ; let U + be the extension of U to the complexifica- 
tion Since U*U = 1 (on U), it follows that (U + )*U + = 1 (on V + ), 
that is, that U + is unitary. 

Let X = a + i0 be a complex number (a and /3 real), and let 9TC be the 
subspace consisting of all solutions of U + z — \z in *U + . (If X is not a 
proper value of [ 7 + , then 3TC = 0.) If z is in 3H, write z — x + iy, with 
x and y in *0. The equation 

Ux + illy * (a + i(f)(x + iy) 


implies (cf. § 77) that 
and 


Ux = ax — &y 
Uy = fix + ay . 


If we multiply the second of the last pair of equations by i and then sub- 
tract it from the first, we obtain 

Ux — iUy = (a — ifi)(x — iy). 

This means that U + z — X£, where the suggestive and convenient symbol 
z denotes, of course, the vector x — iy. Since the argument (that is, the 
passage from U+z = \z to U+z = \2) is reversible, we have proved that 
the mapping z — > z is a one-to-one correspondence between 3TI and the 
subspace 9H consisting of all solutions z of U + z = \z. The result implies, 
among other things, that the complex proper values of t/ + come in pairs; 
if X is one of them, then so is X. (This remark alone we could have ob- 
tained more quickly from the fact that the coefficients of the characteristic 
polynomial of U + are real.) 

We have not yet made use of the unitary character of U + . One way 
we can make use of it is this. If X is a complex (definitely not real) proper 
value of U + y then X ^ X; it follows that if U + z » Xz, so that U + z » XS, 
then z and z are orthogonal. This means that 

0 = (x + iy, X - iy) = II x || 2 - || y || 2 + *'((*, y) + (y , x)), 

and hence that || x || 2 = || y || 2 and (a;, y) = — (y, x). Since a real inner 
product is symmetric ((x, y) = (y, x)), it follows that (x, y) = 0. This, 
in turn, implies that || z || 2 = || x || 2 + || y || 2 and hence that || x || = || y || 

If Xi and X 2 are proper values of U+ with X x j* X 2 and X x X 2 , and 
if z\ = x\ + iyi and z 2 => x 2 + iy 2 are corresponding proper vectors 
(xi, x 2y pi, y 2 in *U), then Z\ and z 2 are orthogonal and (since z 2 is a proper 
vector belonging to the proper value X 2 ) z t and l 2 are also orthogonal. 
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Using again the expression for the complex inner product on V + in terms 
of the real inner product on T), we see that 

(*i, *2) + (yi, yi) = (*1, 2/2) - ( 2 / 1 , *2) « 0 

and 

(*1, ^2) - (2/1, 2/2) = (*1, 2/2) + (2/1, *2) = 0. 

It follows that the four vectors xi, x 2 , yi, and y 2 are pairwise orthogonal. 

The unitary transformation U + could have real proper values too. Since, 
however, we know that the proper values of have absolute value one, 
it follows that the only possible real proper values of U + are +1 and — 1. 
If TJ + (x + iy) - rb(x + iy) f then Ux - dbx and Uy = day, so that the 
proper vectors of U + with real proper values are obtained by putting to- 
gether the proper vectors of U in an obvious manner. 

We are now ready to take the final step. Given U> choose an orthonormal 
basis, say 9C X , in the linear manifold of solutions of Ux = x (in U), and, 
similarly, choose an orthonormal basis, say 9C_i, in the linear manifold of 
solutions of Ux = —x (in V). (The sets 9C X and SC_i may be empty.) 
Next, for each conjugate pair of complex proper values A and X of C7 + , 
choose an orthonormal basis {z x , • • •, z r ] in the linear manifold of solutions 
of U + z — Xz (in U + ). If Zj — Xj + iyj (with xj and yj in *U), let 9 Cx be the 
set { y/2 x 1} y/2 y\, • • •, y/2 x r , \^2 y r } of vectors in *0. The results we 
have obtained imply that if we form the union of all the sets 9Ci, 9C_i, and 
9 C\, for all proper values X of C7 + , we obtain an orthonormal basis of *0. 
In case 9Ci has three elements, 9C_i has four elements, and there are two 
conjugate pairs {X x , X x } and {X 2 , X 2 }, then the matrix of U with respect to 
the basis so constructed looks like this: 
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(All terms not explicitly indicated are equal to zero.) In general, there is 
a string of + Ts on the main diagonal, followed by a string of — l's, and 
then there is a string of two-by-two boxes running down the diagonal, 

each box having the form (js a) 1 ~ l. The fact that 

a 2 + 0 2 a* 1 implies that we can find a real number 0 such that a — cos 0 
and 0 = sin 0; it is customary to use this trigonometric representation in 
writing the canonical form of the matrix of an orthogonal transformation. 


EXERCISES 

1. Every proper value of an orthogonal transformation has absolute value 1. 

2. If A = ^ ^ , how many (real) orthogonal matrices P are there with the 
property that P~ X AP is diagonal? 

3. State and prove a sensible analogue of the spectral theorem for normal trans- 
formations on a real inner product space. 

§ 82. Functions of transformations 

One of the most useful concepts in the theory of normal transformations 
on unitary spaces is that of a function of a transformation. If A is a 
normal transformation with spectral form ajEj (for this discussion we 
temporarily assume that the underlying vector space is a unitary space), 
and if / is an arbitrary complex-valued function defined at least at the 
points ay, then we define a linear transformation f(A) by 

m = zmej. 

Since for polynomials p (and even for rational functions) we have already 
seen that our earlier definition of p(A) yields, if A is normal, p{A) = 
2/ p(« j)Ej, we see that the new notion is a generalization of the old one. 
The advantage of considering /(A) for arbitrary functions / is for us largely 
notational; it introduces nothing conceptually new. Indeed, for an 
arbitrary /, we may write /( a j) “ &'j and then we may find a polynomial 
p that at the finite set of distinct complex numbers ay takes, respectively, 
the values 0j. With this polynomial p we have f(A) = p{A), so that the 
class of transformations defined by the formation of arbitrary functions 
is nothing essentially new; it only saves the trouble of constructing a 
polynomial to fit each special case. Thus for example, if, for each complex 
number X, we write 

AGO = 0 whenever f ^ X 

and 

A(X) - 1, 
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then f\(A) is the perpendicular projection on the subspace of solutions 
of Ax — Xx. 1 

We observe that if /(f) = - , then (assuming of course that / is defined 

for all ay, that is, that ay ^ 0 )/(A) = A and if /(f) = f, then f (A) = A*. 
These statements imply that if / is an arbitrary rational function of f and f , 


we obtain /(A) by the replacements f — > A, f 


A* 



The symbol f{A) is, however, defined for much more general functions, 
and in the sequel we shall feel free to make use of expressions such as eA 
and y/~A. 

A particularly important function is the square root of positive trans- 
formations. We consider /(f) - \/f, defined for all real f ^ 0, as the 
positive square root of f , and for every positive A = 53; ajEj we write 




(Recall that ay ^ 0 for all j. The discussion that follows applies to both 
real and complex inner product spaces.) It is clear that \/a ^ 0 and 
that (a/a ) 2 = A ; we should like to investigate the extent to which these 
properties characterize A /A. At first glance it may seem hopeless to look 
for any uniqueness, since if we consider B = ^y ± A/ay Ej, with an 
arbitrary choice of sign in each place, we still have A = B 2 . The trans- 
formation VA that we constructed, however, was positive, and we can 
show that this additional property guarantees uniqueness. In other 
words: if A — B 2 and B ^ 0, then B = a/a. To prove this, let B = 
pkFk be the spectral form of B; then 

Zkfik 2 F k = B 2 = A = Zyaytfy. 

Since the 0* are distinct and positive, so also are the /3* 2 ; the uniqueness 
of the spectral form of A implies that each /3* 2 is equal to some ay (and 
vice versa), and that the corresponding E’ s and F’s are equal. By a 
permutation of the indices we may therefore achieve /9y 2 = ay for all /, 

so that fa = A /ay, as was to be shown. 

There are several important applications of the existence of square 
roots for positive operators; we shall now give two of them. 

First: we recall that in § 72 we mentioned three possible definitions of 
a positive transformation A, and adopted the weakest one, namely, that 
A is self-adjoint and (Ax, x) ^ 0 for all x. The strongest of the three 
possible definitions was that we could write A in the form A » B 2 for 
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some self-adjoint B. We point out that the result of this section con- 
cerning square roots implies that the (seemingly) weakest one of our 
conditions implies and is therefore equivalent to the strongest. (In fact, 
we can even obtain a unique positive square root.) 

Second: in § 72 we stated also that if A and B are positive and commuta- 
tive, then AB is also positive; we can now give an easy proof of this asser- 
tion. Since V^A and are functions of (polynomials in) A and B 
respectively, the commutativity of A and B implies that V^A and y/ B 
commute with each other; consequently 

ab = Va Va Vb Vb = Va Vb Va Vb = (Va Vb ) 2 . 

Since V~A and Vb are self-adjoint and commutative, their product is 
self-adjoint and therefore its square is positive. 

Spectral theory also makes it quite easy to characterize the matrix (with 
respect to an arbitrary orthonormal coordinate system) of a positive 
transformation A . Since det A is the product of the proper values of A, 
it is clear that A ^ 0 implies det A ^ 0. (The discussion in § 55 applies 
directly to complex inner product spaces only; the appropriate modification 
needed for the discussion of self-adjoint transformations on possibly real 
spaces is, however, quite easy to supply.) If we consider the defining 
property of positiveness expressed in terms of the matrix (<x,y) of A, that 
is, ^ 0, we observe that the last expression remains positive 

if we restrict the coordinates (£i, • • *, £„) by requiring that certain ones 
of them vanish. In terms of the matrix this means that if we cross out 
the columns numbered j if • • * , jk, say, and cross out also the rows bearing 
the same numbers, the remaining small matrix is still positive, and conse- 
quently so is its determinant. This fact is usually expressed by saying that 
the 'principal minors of the determinant of a positive matrix are positive. 
The converse is true. The coefficient of the j-th power of X in the charac- 
teristic polynomial det (A — X) of A is (except for sign) the sum of all 
principal minors of n-j rows and columns. The sign is alternately plus 
and minus; this implies that if A has positive principal minors and is 
self-adjoint (so that the zeros of det (A — X) are known to be real), then 
the proper values of A are positive. Since the self-adjoint character of a 
matrix is ascertainable by observing whether or not it is (Hermitian) sym- 
metric {otij — oLji ) , our comments reduce the problem of finding out whether 
or not a matrix is positive to a finite number of elementary computations. 
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EXEKCI SB S 

1. Corresponding to every unitary transformation U there is a Hermitian 
transformation A such that U = e iA . 

2. Discuss the theory of functions of a normal transformation on a real inner 
product space. 

3. If A ^ B and if C is a positive transformation that commutes with both A 
and B, then AC ^ BC. 

4. A self-adjoint transformation has a unique self-adjoint cube root. 

5. Find all Hermitian cube roots of the matrix 



6. (a) Give an example of a linear transformation A on a finite-dimensional 
unitary space such that A has no square root. 

(b) Prove that every Hermitian transformation on a finite-dimensional unitary 
space has a square root. 

(c) Does every self-adjoint transformation on a finite-dimensional Euclidean 
space have a square root? 

7. (a) Prove that if A is a positive linear transformation on a finite-dimensional 

inner product space, then p( VA ) = p(A). 

(b) If A is a linear transformation on a finite-dimensional inner product space, 
is it true that p(A*A) — p(A)? 

8. If A ^ 0 and if (Ax, x) = 0 for some x, then Ax = 0. 

9. If A ^ 0, then | (Ax, y) | 2 ^ (Ax, x)(Ay, y) for all x and y . 

10. If the vectors xi, • •• , xu are linearly independent, then their Gramian is 
non-singular. 

11. Every positive matrix is a Gramian. 

12. If A and B are linear transformations on a finite-dimensional inner product 
space, and if 0 ^ A ^ B, then det A ^ det B. (Hint: the conclusion is trivial if 
det B = 0; if det B ^ 0, then Vb is invertible.) 

13. If a linear transformation A on a finite-dimensional inner product space is 
strictly positive and if A ^ B, then B~ l ^ A -1 . (Hint: try A — 1 first.) 

14. (a) If B is a Hermitian transformation on a finite-dimensional unitary space, 
then 1 + iB is invertible. 

(b) If A is positive and invertible and if B is Hermitian, then A + \B is invertible. 

15. If 0 ^ A ^ B, then Va ^ Vb. (Hint: compute 

( VI + Va + e)( Vb - Va + «), 

and prove thereby that the second factor is invertible whenever e > 0.) 
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16. Suppose that A is a self-adjoint transformation on a finite-dimensional 
inner product space; write |A| — Va 2 , A+ ~ £(|A| + A), and A_ = £(|A| 
-A). 

(a) Prove that |A| is the smallest Hermitian transformation that commutes 
with A and for which both A g \A\ and —A ^ |A|. (“Smallest” refers, of 
course, to the ordering of Hermitian transformations.) 

(b) Prove that A + is the smallest positive transformation that commutes with 
A and for which A A + . 

(c) Prove that A _ is the smallest positive transformation that commutes with 
A and for which — A ^ A_. 

(d) Prove that if A and B are self-adjoint and commutative, then there exists a 
smallest self-adjoint transformation C that commutes with both A and B and for 
which both A ^ C and B £ C. 

17. (a) If A and B are positive linear transformations on a finite-dimensional 
unitary space, and if A 2 and B 2 are unitarily equivalent, then A and B are unitarily 
equivalent. 

(b) Is the real analogue of (a) true? 


§ 83. Polar decomposition 

There is another useful consequence of the theory of square roots, namely, 
the analogue of the polar representation f = pe xQ of a complex number. 

Theorem 1. If A is an arbitrary linear transformation on a finite-di- 
mensional inner product space } then there is a {uniquely determined) 
positive transformation P y and there is an isometry U } such that A — UP, 
If A is invertible , then U also is uniquely determined by A. 

proof. Although it is not logically necessary to do so, we shall first 
give the proof in case A is invertible; the general proof is an obvious 
modification of this special one, and the special proof gives greater in- 
sight into the geometric structure of the transformation A. 

Since the transformation A*A is positive, we may find its (unique) 
positive square root, P = VA*A. We write V — PA “ x ; since V A = P, 
the theorem will be proved if we can prove that V is an isometry, for then 
we may write U = F” 1 . Since 

V * = (A-^P* = (A *)“*?, 

we see that 

V*V - (A^PPA- 1 = (A*)" 1 A*AA"’ 1 = 1, 

so that F is an isometry, and we are done. 

To prove uniqueness we observe that UP = U 0 P 0 implies PU* = P 0 [/ 0 * 
and therefore 


P 2 = PU*UP - Pq Uq*UqPq - P 0 2 . 
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Since the positive transformation P 2 = P 0 2 has only one positive square 
root, it follows that P = Po • (In this part of the proof we did not use the 
invertibility of A.) If A is invertible, then so is P (since P = U~ l A), 
and from this we obtain (multiplying the relation UP - UqP 0 on the 
right by P” 1 = Po^ 1 ) that U = C7o- 

We turn now to the general case, where we do not assume that A is 
invertible. We form P exactly the same way as above, so that P 2 = A* A, 
and then we observe that for every vector x we have 

II Px II 2 = (Px, Px) = (P 2 x, x) = (A*Ax, x) - || Ax || 2 . 

If for each vector y = Px in the range (R(P) of P we write Uy = Ax, then 
the transformation U is length-preserving wherever it is defined. We 
must show that U is unambiguously determined, that is, that Px x = Px 2 
implies Ax\ = Ax 2 . This is true since P(x x — x 2 ) = 0 is equivalent to 
|| P(x i — x 2 ) || = 0 and this latter condition implies || A(xi — x 2 ) || — 0. 
The range of the transformation U , defined so far on the subspace (R(P) 
only, is <R(A ). Since U is linear, (R(A) and (R(P) have the same dimension, 
and therefore ((R(A)) x and ((R(P)) 1 have the same dimension. If we define 
U on (0t(P)) x to be any linear and isometric transformation of ((R(P)) X 
onto (<5t(A)) x , then U, thereby determined on all V, is an isometry with 
the property that UPx — Ax for all x. This completes the proof. 

Applying the theorem just proved to A* in place of A, and then taking 
adjoints, we obtain also the dual fact that every A may be written in the 
form A = PU with an isometric U and a positive P. In contrast with the 
Cartesian decomposition (§ 70), we call the representation A = UP a 
polar decomposition of A . 

In terms of polar decompositions we obtain a new characterization of 
normality. 

Theorem 2. If A — UP is a polar decomposition of the linear transforma- 
tion A, then a necessary and sufficient condition that A be normal is that 
PU = UP. 

proof. Since U is not necessarily uniquely determined by A, the state- 
ment is to be interpreted as follows: if A is normal, then P commutes with 
every U, and if P commutes with some U, then A is normal. Since A A* 
= UP 2 U* — UP 2 U~ l and A* A ~ P 2 , it is clear that A is normal if and 
only if U commutes with P 2 . Since, however, P 2 is a function of P and 
vice versa P is a function of P 2 (P = V P 2 ), it follows that commuting 
with P 2 is equivalent to commuting with P. 
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EXERCISES 

1. If a linear transformation on a finite-dimensional inner product space has 
only one polar decomposition, then it is invertible. 

2. Use the functional calculus to derive the polar decomposition of a normal 
operator. 

3. (a) If A is an arbitrary linear transformation on a finite-dimensional inner 
product space, then there is a partial isometry U , and there is a positive transforma- 
tion P, such that 3l(U) — 9l(P) and such that A = UP . The transformations 
U and P are uniquely determined by these conditions. 

(b) The transformation A is normal if and only if the transformations U and P 
described in (a) commute with each other. 

§ 84. Commutativity 

The spectral theorem for self-adjoint and for normal operators and the 
functional calculus may also be used to solve certain problems concerning 
commutativity. This is a deep and extensive subject; more to illustrate 
some methods than for the actual results we discuss two theorems from it. 

Theorem 1. Two self-adjoint transformations A and B on a finite-di- 
mensional inner product space are commutative if and only if there exists 
a self-adjoint transformation C and there exist two real-valued functions 
f and g of a real variable so that A = f(C) and B = g(C). If such a C 
exists , then we may even choose C in the form C = h(A , B), where h is a 
suitable real-valued function of two real variables . 

proop. The sufficiency of the condition is clear; we prove only the 
necessity. 

Let A = otiEi and B = be the spectral forms of A and B; 

since A and B commute, it follows from § 79, Theorem 3, that E { and F§ 
commute. Let h be any function of two real variables such that the num- 
bers h(ai, /5y) — yij are all distinct, and write 

C = h(A, B) - E. Z; Pj)EiFj. 

(It is clear that h may even be chosen as a polynomial, and the same is 
true of the functions / and g we are about to describe.) Let f and g 
be such that f(y {j ) = a,* and 0 ( 7 * 7 ) = 0y for all i and j. It follows that 
/(C) = A and g{C) = B, and everything is proved. 

Theorem 2. If A is a normal transformation on a finite-dimensional 
unitary space and if B is an arbitrary transformation that commutes with 
A, then B commutes with A *. 
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proof. Let A « ZiOiEi be the spectral form of A; then A* = 
ZfcE* Let / be such a function (polynomial) of a complex variable 
that /(«»•) — ai for all i. Since A* = /(A), the conclusion follows. 


EXERCISES 

1. (a) Prove the following generalization of Theorem 2: if Ai and A* are normal 
transformations (on a finite-dimensional unitary space) and if A\B = BA 2> then 
A i*5 « 5A 2 *. 

(b) Theorem 2 asserts that the relation of commutativity is sometimes transitive: 
if A* commutes with A and if A commutes with B, then A* commutes with 5. 
Does this formulation remain true if A* is replaced by an arbitrary transformation 
C? 

2. (a) If A commutes with A* A, does it follow that A is normal? 

(b) If A*A commutes with AA*, does it follow that A is normal? 

3. (a) A linear transformation A is normal if and only if there exists a poly- 
nomial p such that A* = p(A). 

(b) If A is normal and commutes with B f then A commutes with 5*. 

(c) If A and B are normal and commutative, then AB is normal. 

4. If A and B are normal and similar, then they are unitarily equivalent. 

5. (a) If A is Hermitian, if every proper value of A has multiplicity 1, and if 
AB = BA, then there exists a polynomial p such that B — p(A). 

(b) If A is Hermitian, then a necessary and sufficient condition that there exist 
a polynomial p such that B = p(A) is that B commute with every linear transforma- 
tion that commutes with A. 

6. Show that a commutative set of normal transformations on a finite-dimensional 
unitary space can be simultaneously diagonalized. 

§ 85. Self-adjoint transformations of rank one 

We have already seen (§ 51, Theorem 2) that every linear transformation 
A of rank p is the sum of p linear transformations of rank one. It is easy to 
see (using the spectral theorem) that if A is self-adjoint, or positive, then 
the s umm ands may also be taken self-adjoint, or positive, respectively. 
We know (§ 51, Theorem 1) what the matrix of transformation of rank one 
has to be; what more can we say if the transformation is self-adjoint or 
positive? 

Theorem 1. If A has rank one and is self-adjoint (or positive ), then in 
every orthonormal coordinate system the matrix (a,/) of A is given by a,y 
= K$n 3,* with a real k (or by If, conversely , [A] has this form 

in some orthonormal coordinate system , then A has rank one and is self- 
adjoint (or positive). 
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proof. We know that the matrix (a t y) of a transformation A of rank 
one, in any orthonormal coordinate system 9C = {xi, • • •, x n ), is given by 
ay = Pa y. If A is self-adjoint, we must also have a»y = ay*, whence 
ft-7y * fty* If ft = 0 and ^ 0 for some i, then fa = ftyy/ y» - 0 for 
all j, whence A = 0. Since we assumed that the rank of A is one (and not 
zero), this is impossible. Similarly ft ^ 0 and y { = 0 is impossible; that 
is, we can find an i for which ft-y* 0. Using this i, we have 

h = {Pi/idJi = *Ty 

with some non-zero constant *, independent of j. Since the diagonal 
elements ayy = (Axy, xy) = ftyy of a self-adjoint matrix are real, we can 
even conclude that a*y = KpSj with a real k . 

If, moreover, A is positive, then we even know that /cftft = ayy = 
(Axy, xy) is positive, and therefore so is k. In this case we write X — V7; 
the relation *ftft = (Xft) (Xft) shows that a*y is given by a t y = yaj. 

It is easy to see that these necessary conditions are also sufficient. 
If a ij = Kpifij with a real k, then A is self-adjoint. If a*y = 7,77, and x 
“ then 

(Ax, x) = a *yf»fy “ Si T*Tyf»fy 

= (Z)» Tyfy) = I St Tift I 2 S o, 

so that A is positive. 

As a consequence of Theorem 1 it is very easy to prove a remarkable 
theorem on positive matrices. 

Theorem 2. If A and B are positive linear transformations whose matrices 
in some orthonormal coordinate system are (a*y) and (fty) respectively , then 
the linear transformation C, whose matrix {y if) in the same coordinate 
system is given by y yy = a^-fty /or all i and j , is aiso positive. 

proof. Since we may write both A and Z? as sums of positive transforma- 
tions of rank one, so that 

«*y = Sp 

and 

fry = Z 9 

it follows that 

yu = Zp Z \q 

(The superscripts here are not exponents.) Since a sum of positive ma- 
trices is positive, it will be sufficient to prove that, for each fixed p and 
q, the matrix ((afft) (a? fa)) is positive, and this follows from Theorem 1. 
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The proof shows, by the way, that Theorem 2 remains valid if we re- 
place “positive” by “self-adjoint” in both hypothesis and conclusion; in 
most applications, however, it is only the actually stated version that is 
useful. The matrix ( 7 , 7 ) described in Theorem 2 is called the Hadamard 
■product of (ay) and (/3y). 


EXERCISES 

1. Suppose that 11 and *0 are finite-dimensional inner product spaces (both real 
or both complex). 

(a) There is a unique inner product on the vector space of all bilinear forms on 
11 ©1) such that if Wi(x, y) - (s, z{)(y, t/i) and w 2 (x, y) = (x, x 2 )(y, y 2 ), then 
(wi,wt) = (**, xi)(y 2 , yi), 

(b) There is a unique inner product on the tensor product U ® V such that if 
zi = xi ® yi and z 2 = x 2 ® y 2) then (z h z 2 ) = (x h x 2 )(yi, y 2 ). 

(c) If {xi} and {y p } are orthonormal bases in 11 and 1), respectively, then the 
vectors Xi ® y p form an orthonormal basis in U 0 

2. Is the tensor product of two Hermitian transformations necessarily Hermitian? 
What about unitary transformations? What about normal transformations? 
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§ 86, Convergence of vectors 

Essentially the only way in which we exploited, so far, the existence of 
an inner product in an inner product space was to introduce the notion of 
a normal transformation together with certain important special cases 
of it. A much more obvious circle of ideas is the study of the convergence 
problems that arise in an inner product space. 

Let us see what we might mean by the assertion that a sequence (x») 
of vectors in 1) converges to a vector x in *0. There are two possibilities 
that suggest themselves: 

(i) || x n — x || — > 0 as n — > oo; 

(ii) (x n — x, y) — » 0 as n — > <», for each fixed y in T>. 

If (i) is true, then we have, for every y, 

I ( Xn - X, y) I g II Xn - x || • || y II — > 0, 

so that (ii) is true. In a finite-dimensional space the converse implication 
is valid: (ii) => (i). To prove this, let {zj, • • •, Zn] be an orthonormal 
basis in V . (Often in this chapter we shall write N for the dimension of a 
finite-dimensional vector space, in order to reserve n for the dummy 
variable in limiting processes.) If we assume (ii), then ( x n — x, z,) — ► 0 
for each i = 1, • • •, N. Since (§ 63, Theorem 2) 

II X n - X || 2 = | (X„ - X, 2<)| 2 , 

it follows that || z n — x || — > 0, as was to be proved. 

Concerning the convergence of vectors (in either of the two equivalent 
senses) we shall use without proof the following facts. (All these facts are 
easy consequences of our definitions and of the properties of convergence 
in the usual domain of complex numbers; we assume that the reader has 
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a modicum of familiarity with these notions.) The expression ax + f$y 
defines a continuous function of all its arguments simultaneously; that is, 
if (a n ) and (fi n ) are sequences of numbers and (x n ) and (y n ) are sequences 
of vectors, then a n — > a, p n — > P, x n — ► x, and y n -> y imply that 
a n x n + p n y n ax + @y. If { z t } is an orthonormal basis in *U, and if 
x n = ai n Zi and x = a^*, then a necessary and sufficient condition 

that x n — ► x is that a in — > a,- (as » — » ») for each i = 1, •••,#. (Thus 
the notion of convergence here defined coincides with the usual one in 
AT-dimensional real or complex coordinate space.) Finally, we shall assume 
as known the fact that a finite-dimensional inner product space with the 
metric defined by the norm is complete; that is, if (x n ) is a sequence of 
vectors for which || x n — x m || — » 0, as n, m — > oo, then there is a (unique) 
vector x such that x n — ► x as n — > oo. 

§ 87. Norm 

The metric properties of vectors have certain important implications 
for the metric properties of linear transformations, which we now begin 
to study. 

Definition. A linear transformation A on an inner product space *0 
is bounded if there exists a constant K such that || Ax || ^ K\\ x || for 
every vector x in *0. The greatest lower bound of all constants K with 
this property is called the norm (or bound ) of A and is denoted by || A ||. 

Clearly if A is bounded, then || Ax || ^ || A || • || x || for all x. For examples 
we may consider the cases where A is a (non-zero) perpendicular projection 
or an isometry; § 75, Theorem 1, and the theorem of § 73, respectively, 
imply that in both cases || A || — 1. Considerations of the vectors defined 
by x n {t) = t n in (P shows that the differentiation transformation is not 
bounded. 

Because in the sequel we shall have occasion to consider quite a few 
upper and lower bounds similar to || A ||, we introduce a convenient nota- 
tion. If P is any possible property of real numbers £, we shall denote the 
set of all real numbers t possessing the property P by the symbol {t: P\> 
and we shall denote greatest lower bound and least upper bound by inf 
(for infimum) and sup (for supremum) respectively. In this notation 
we have, for example, 

|| A || = inf {K:\\ Ax || ^ K\\ x || for all x}. 

The notion of boundedness is closely connected with the notion of 
continuity. If A is bounded and if c is any positive number, by writing 
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5 = 7TTT\ we ma y ma ke sure that || x — y || <6 implies that 
II A [I 

II Ax-Ay || = || A(x — y) || ^ M INI * “ V II < e; 

in other words boundedness implies (uniform) continuity. (In this proof 
we tacitly assumed that |) A |] ^0; the other case is trivial.) In view of 
this fact the following result is a welcome one. 

Theorem. Every linear transformation on a finite-dimensional inner 
product space is bounded. 

proof. Suppose that A is a linear transformation on D; let [x u • • •, x^r} 
be an orthonormal basis in and write 

K 0 = max {|| Axx ||, •••,![ Ax N ||}. 

Since an arbitrary vector x may be written in the form x — ^ (x, x,)x,, 
we obtain, applying the Schwarz inequality and remembering that || x, || 

|| Ax || = || A(£i (x, xfa) || 

- II T,i(x,*i)A Xi || ^ £, |(x, Xi) | • || Ax, || 

II* INI* INI AxiW^Ko £, 11*11 

-tf*oll*ll. 

In other words, K = NK 0 is a bound of A, and the proof is complete. 

It is no accident that the dimension N of *0 enters into our evaluation; 
we have already seen that the theorem is not true in infinite-dimensional 
spaces. 


EXERCISES 

1. (a) Prove that the inner product is a continuous function (and therefore so 
also is the norm); that is, if x n — ► x and y n — > y, then (x n , y n ) —► (x, y). 

(b) Is every linear functional continuous? How about multilinear forms? 

2. A linear transformation A on an inner product space is said to be bounded 
from below if there exists a (strictly) positive constant K such that || Ax |j 2£|| x || 
for every x. Prove that (on a finite-dimensional space) A is bounded from below 
if and only if it is invertible. 

3. If a linear transformation on an inner product space (not necessarily finite- 
dimensional) is continuous at one point, then it is bounded (and consequently con- 
tinuous over the whole space). 

4. For each positive integer n construct a projection E n (not a perpendicular 
projection) such that || E n || ^ n. 

5. (a) If U is a partial isometry other than 0, then || U || — 1. 

(b) If U is an isometiy, then || UA || = || AU || = || A || for every linear trans- 
formation A. 
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6. If E and F are perpendicular projections, with ranges 3TC and 31 respectively, 
and if || E — F || <1, then dim 3TC = diin 31. 

7. (a) If A is normal, then || A* || = || A || n for every positive integer n. 

(b) If A is a linear transformation on a 2-dimensional unitary space and if 
|| A 2 || = || A || 2 , then A is normal, 

(c) Is the conclusion of (b) true for transformations on a 3-dimensional space? 


§ 88. Expressions for the norm 

To facilitate working with the norm of a transformation, we consider the 
following four expressions: 

p = sup ||| Ax ll/ll X |]: x y* 0}, 
q = sup {|| Ax || : \\x\\ = 1}, 
r = sup { | (Ax, y)\/\\ x || • || y ||:x 0, y 0}, 

s = sup { | (Ax, y ) | : || x || = || y || = 1|. 

In accordance with our definition of the brace notation, the expression 
{ || Ax || : || x || = 1 j , for example, means the set of all real numbers of the 
form || Ax ||, considered for all x’s for which || x || = 1. 

Since || Ax || g K\\ x || is trivially true with any K if x = 0, the definition 
of supremum implies that p = || A || ; we shall prove that, in fact, p = q 
z=z r = s || A || . Since the supremum in the expression for q is extended 
over a subset of the corresponding set for p (that is, if \\ x \\ — 1, then 
|| Ax ||/|| x || = || Ax ||), we see that q ^ p; a similar argument shows that 
s g r. 

x 

For any x y* 0 we consider y = — n (so that || y || = 1); we have 

II * II 

|| Ax ll/ll x || = || Ay || . In other words, every number of the set whose 
supremum is p occurs also in the corresponding set for q; it follows that 
p g q, and consequently that p = q = |j A ||. 

Similarly if x ^ 0 and y ^ 0, we consider x' = a:/|| x || and y' = y/\\ y |[ ; 
we have 

l(^,y)l/IMI • II y II = l^.iOl, 

and hence, by the argument just used, r ^ s, so that r = s. 

To consolidate our position, we note that so far we have proved that 

p = q = || A || and r = s. 

Since 

I (Ax, y) | < II Ax || • || y || = || Ax || 

II x || • || y || = II x || • || y || 1| x || 

it follows that r ^ p; we shall complete the proof by showing that p ^ r. 
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For this purpose we consider any vector x for which Ax ^ 0 (so that 
* 0) ; for such an x we write y = Ax and we have 

II Ax ||/|| x || = \{Ax,y)\/\\x\\.\\y\\. 

In other words, we proved that every number that occurs in the set defining 
2 >, and is different from zero, occurs also in the set of which r is the supremum; 
this clearly implies the desired result. 

The numerical function of a transformation A given by || A || satisfies 
the following four conditions: 


(1) 

\\A + B || || A || + || B ||, 

(2) 

]| AB II g 1) A II • II B II, 

(3) 

ll«A||- W-MII, 

(4) 

M* 11 = Mil- 


The proof of the first three of these is immediate from the definition of the 
norm of a transformation; for the proof of (4) we use the equation || A || 
= r, as follows. Since 

\(Ax,y)\ = | (x,A*y)\ £ || z || • || A*y || 

S M*|| • , ||*|| • II 2/ II, 

we see that || A || ^ || A* ||; replacing A by A* and A* by A ** = A, we 
obtain the reverse inequality. 


EXERCISES 

1. If B is invertible, then || AB || ^ |) A ||/|| 5” 1 1| for every A, 

2. Is it true for every linear transformation A that || A* A || = || A A* ||? 

3. (a) If A is Hermitian and if a ^ 0, then a necessary and sufficient condition 
that || A || ^ a is that —a ^ A ^ a. 

(b) If A is Hermitian, if a g A ^ P, and if p is a polynomial such that p(t) 
0 whenever a ^ t S &, then p(A) ^ 0. 

(c) If A is Hermitian, if a ^ A ^ and if p is a polynomial such that p{t) 
0 whenever a £ t ^ 0, then p{A) is invertible. 


§ 89. Bounds of a self-adjoint transformation 

As usual we can say a little more about the special case of self-adjoint 
transformations than in the general case. We consider, for any self-adjoint 
transformation A, the sets of real numbers 

* = {(A*,*)/ 1| x || 2 : x ^0} 

♦- {(Ax,*): ||*D = 1}. 


and 
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It is clear that ¥ C If, for every x ^ 0, we write y = x/\\ x ||, then 
|| y || = 1 and {Ax, x)/\\ x || 2 = {Ay, y ), so that every number in $ occurs 
also in ¥ and consequently <£ — We write 

a = inf = inf 

P — sup $ = sup 

and we say that a is the lower bound and p is the upper bound of the self- 
adjoint transformation A. If we recall the definition of a positive trans- 
formation, we see that a is the greatest real number for which A — a 
^ 0 and P is the least real number for which P — A ^ 0. Concerning 
these numbers we assert that 

y = max {|«|, |/?|| = || A ||. 

Half the proof is easy. Since 

I {Ax, x) | g || Ax || • || 2 || £ || A || - || s || 2 , 

it is clear that both |a| and |/3| are dominated by || A ||. To prove the 
reverse inequality, we observe that the positive character of the two 
linear transformations y — A and y + A implies that both 

{y + A)*{y - A){ y + A) = {y + A)(y - A){y + A) 

and 

(y - A)*(y + A)(y — A) — (y — A)(y + A)(y - A) 

are positive, and, therefore, so also is their sum 27 ( 7 2 — A 2 ). Since 
7 = 0 implies || A || = 0, the assertion is trivial in this case; in any other 
case we may divide by 2 and obtain the result that y 2 — A 2 ^ 0. In 
other words, 

7 2 || X II 2 = y 2 {x, x) ^ (A 2 x, x) = II Ax || 2 , 

whence y ^ || A [|, and the proof is complete. 

We call the reader’s attention to the fact that the computation in the 
main body of this proof could have been avoided entirely. Since both 
y — A and y + A are positive, and since they commute, we may conclude 
immediately (§ 82) that their product y 2 - A 2 is positive. We presented 
the roundabout method in accordance with the principle that, with an 
eye to the generalizations of the theory, one should avoid using the spectral 
theorem whenever possible. Our proof of the fact that the positiveness 
and commutativity of A and B imply the positiveness of AB was based 
on the existence of square roots for positive transformations. This fact, 
to be sure, can be obtained by so-called “elementary” methods, that is, 
methods not using the spectral theorem, but even the simplest elementary 
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proof involves complications that are purely technical and, for our pur- 
poses, not particularly useful. 

§ 90. Minimax principle 

A very elegant and useful fact concerning self-adjoint transformations 
is the following minimax principle . 

Theorem. Let A be a self-adjoint transformation on an n-dimensional 
inner product space V, and let Xi, • • • , X n be the ( not necessarily distinct) 
proper values of A, with the notation so chosen that Xj ^ X 2 ^ • • • ^ X n . 
If, for each subspace 3TI of T>, 

m( 3TI) = sup {(Ax, x):x in 311, || x || = 1}, 
and if, for k = 1, • • •, n, 

Hi = inf {/*(3Tl): dim 3TI = n — k + 1), 
then Hi = X* for k = 1, • * *, n. 

proof. Let {x\, • • •, x n ] be an orthonormal basis in V for which Axi 
— \iXi, i = 1, • • • , n (§ 79) ; let 311* be the subspace spanned by X\, • • • , Xi, 
for k — 1, • • •, n. Since the dimension of 311* is k, the subspace 3TC* cannot 
be disjoint from any (n — k + 1) -dimensional subspace 3TI in V; if 3il is 
any such subspace, we may find a vector x belonging to both 3TC* and 3E 
and such that \\x\\ = 1. For this x = we have 

(Ax, x) = ^ |?<|* 

“ Xl|| * l| 2 “ Xi, 

80 that ^ Xi. 

If, on the other hand, we consider the particular (n — k + l)-dimensional 
subspace 3R 0 spanned by x k , x k +i, • • •, x n , then, for each x — X)?-* 
in this subspace, we have (assuming \\x\\ = 1) 

(Ax, x) = ZU X,lf.| 2 ^ Xi 

“ A|-|| x || 2 = X*, 

so that /i(31Zo) ^ X*. 

In other words, as 3TX runs over all (n — k + l)-dimensional subspaces, 
h( 3Tt) is always ^ X*, and is at least once ^ X*; this shows that Hk = X*, 
as was to be proved. 

In particular for k = 1 we see (using § 89) that if A is self-adjoint, then 
|| A || is equal to the maximum of the absolute values of the proper values 
of A. 
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EXERCISES 


1. If X is a proper value of a linear transformation A on a finite-dimensional inner 
product space, then | A | ^ || A ||. 

2. If A and B are linear transformations on a finite-dimensional unitary space, 
and if C — AB — BA, then || 1 - C || ^ 1. (Hint: consider the proper values of 
C.) 

3. If A and B are linear transformations on a finite-dimensional unitary space, 
if C = AB - BA, and if C commutes with A, then C is not invertible. (Hint: 
if C is invertible, then 2|| B || • || A || * ||A*” l || ^ k || A*~ 1 ||/|| C~ 1 ||.) 

4. (a) If A is a normal linear transformation on a finite-dimensional unitary 
space, then || A || is equal to the maximum of the absolute values of the proper 
values of A. 

(b) Does the conclusion of (a) remain true if the hypothesis of normality is 
omitted? 


5. The spectral radius of a linear transformation A on a finite-dimensional 
unitary space, denoted by r(A), is the maximum of the absolute values of the proper 
values of A. 

(a) If /(X) - ((1 - XA) _1 x, y), then / is an analytic function of X in the region 


determined by |X | < (for each fixed x and y ). 


(b) There exists a constant K such that |X| n || A n || ^ K whenever | A [ < 


and « = 0, 1, 2, • • •. (Hint: for each x and y there exists a constant K such that 
|X n (A n x, y) | ^ K for all n.) 

(c) lim sup n || A n || 1/n r(A). 

(d) (r(A))"£ r(A"), n = 0, 1,2,..*. 

(e) r(A) = lim n || A n || 1/n . 


6. If A is a hnear transformation on a finite-dimensional unitary space, then 
a necessary and sufficient condition that r(A) = || A || is that || A n || == || A || n 
for n = 0, 1, 2, • • * . 


7. (a) If A is a positive linear transformation on a finite-dimensional inner 
product space, and if AB is self-adjoint, then 


\(ABz,z)\ £ || # II * (Ax, z) 


for every vector x. 

(b) Does the conclusion of (a) remain true if || B || is replaced by r(B)? 


§ 91- Convergence of linear transformations 

We return now to the consideration of convergence problems. There 
are three obvious senses in which we may try to define the convergence of a 
sequence (A n ) of linear transformations to a fixed linear transformation A. 

|| A n — A || -> 0 as n «>. 


(i) 
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(ii) || A„x — Ax || — ► 0 as n — *■ «o for each fixed x. 

(iii) | (A»x, y) — (Ax, y) | -* 0 as n — ► « for each fixed x and y. 

If (i) is true, then, for every x, 

II A n x -Ax II = II (An - A)x II g II An — A Ml x II -» 0, 

so that (i) => (ii). We have already seen (§ 86) that (ii) => (iii) and that 

in finite-dimensional spaces (iii) => (ii). It is even true that in finite- 
dimensional spaces (ii) => (i), so that all three conditions are equivalent. 
To prove this, let \xi, •••,£#} be an orthonormal basis in V. Ifwesuppose 
that (ii) holds, then, for each « > 0, we may find an n 0 = n 0 («) such that 
|| A n x, — Axi || < « for n S n 0 and for i = 1, • • •, N. It follows that for 
an arbitrary x = (x, x,)x,- we have 

|| (A„ - A)x || = || (x, x,)(A„ - A)xi || 

^ Hi II * || • || (A„ - A)Xi || ^ eN || x ||, 

and this implies (i). 

It is also easy to prove that if the norm is used to define a distance for 
transformations, then the resulting metric space is complete, that is, 
if || A n — A m || —» 0 as n, m — > oo, then there is an A such that 
|| A n — A || — > 0. The proof of this fact is reduced to the corresponding 
fact for vectors. If || A w — A m || — ► 0, then || A n x — A m x || — > 0 for 
each x , so that we may find a vector corresponding to x, which we may 
denote by Ax , say, such that || A n x — Ax || — > 0. It is clear that the cor- 
respondence from x to Ax is given by a linear transformation A; the 
implication relation (ii) =» (i) proved above completes the proof. 

Now that we know what convergence means for linear transformations, 
it behooves us to examine some simple functions of these transformations 
in order to verify their continuity. We assert that || A ||, || Ax ||, (Ax, y), 
Ax, A + B, aA, AB } and A* all define continuous functions of all their 
arguments simultaneously. (Observe that the first three are numerical- 
valued functions, the next is vector-valued, and the last four are transforma- 
tion-valued.) The proofs of these statements are all quite easy, and 
similar to each other; to illustrate the ideas we discuss (| A ||, Ax, and A*. 
(1) If A n — > A, that is, || A„ — A || — > 0, then, since the relations 

|| A n || £|| A n — AH + MH, 

and 

II A || ^ || A — An || + || An || 

imply that 

III An || -|| A III ^11 An -A ||, 
we see that || A„ || — » || A ||. 
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(2) If A n -> A and x« x, then 

II A n x n - Ax || g II A„x n - Ax n II + II Ax n - Ax \\ -> 0, 

so that A nXfi ^ Ax. 

(3) If A n -> then, for each x and y, 


CA„*x, ?/) = (x, A n y) = (A n y, x) -> z) 


= ( 2 /, A*x) = (A*x, y ), 

whence A n * — > A* 


EXERCISES 

1. A sequence (A») of linear transformations converges to a linear transformation 
A if and only if, for every coordinate system, each entry in the matrix of A n con- 
verges, as n — > «, to the corresponding entry in the matrix of A . 

2. For every linear transformation A there exists a sequence (A») of invertible 
linear transformations such that A n — ► A. 

3. If E and F are perpendicular projections, then ( EFE) n converges, as n — * *>, 
to the projection whose range is the intersection of the ranges of E and F . 

4. If A is a linear transformation on a finite-dimensional unitary space, then a 
necessary and sufficient condition that A" — * 0 is that all the proper values of 
A be (strictly) less than 1 in absolute value. 

5. Prove that if A is the n-by-n matrix 


rO 

1 

0 •• 

- 01 

0 

0 

1 * 

• 0 

0 

0 

0 •« 

■ • 1 

1 

1 

1 

1 

_n 

» 

n 

n. 


then A* converges, as k <», to a projection whose range is one-dimensional; 
find the range. 

6. Prove that det and fr are continuous. 

§ 92. Ergodic theorem 

The routine work is out of the way; we go on to illustrate the general 
theory by considering some very special but quite important convergence 
problems. 
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Theorem. If U is an isometry on a finite-dimensional inner product 
space , and if DU is the subspace of all solutions of TJx = x, then (he sequence 
defined by 

F n =-(1 + U + ---+ u n -') 

n 

converges as n —> <x> to the perpendicular projection E = P$\i- 

proof. Let Dl be the range of the linear transformation 1 — U. If 
x — y — Uy is in 91, then 

V n x - - (y - Uy + Ut/ - U 2 2/ + • • • + - U n y) 

n 

= -(</- U n y), 
n 

so that 

II II = -II y 
n 



n 


This implies that V n x converges to zero when x is in 91. 

On the other hand, if x is in DU, that is, Ux = x, then V n x = x, so that 
in this case F n x certainly converges to x. 

We shall complete the proof by showing that Dl x = DU. (This will 
imply that every vector is a sum of two vectors for which (F n ) converges, 
so that (F n ) converges everywhere. What we have already proved about 
the limit of (F«) in Dll and in 91 shows that (F«x) always converges to the 
projection of x in DU.) To show that Dl x = DU, we observe that x is in the 
orthogonal complement of 91 if and only if (x, y — Uy) = 0 for all y . 
This in turn implies that 

0 = (x, y - Uy) = (x, y) - (x, Uy) = (x, y) - ( U*x , y) 

“ (x - U*x, y) } 

that is, that x — t/*x = x — {/“^x is orthogonal to every vector y y so 
that x — U~ l x = 0, x = ?7~ 1 x, or Ux = x. Reading the last computa- 
tion from right to left shows that this necessary condition is also suf- 
ficient; we need only to recall the definition of DU to see that DU =» 9l x . 

This very ingenious proof, which works with only very slight modifica- 
tions in most of the important infinite-dimensional cases, is due to F. Riesz. 


~U n y || g — (|| 2/ II + II U n y ||) 
n 
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§ 93* Power series 

We consider next the so-called Neumann series ]T)»-o A n , where A is 
a linear transformation with norm < 1 on a finite-dimensional vector space. 
If we write 

S,- ZS-oA", 

then 

(1) (1 - A)S P = S p - AS, - 1 - A* + \ 

To prove that S, has a limit as p — > », we consider (for any two indices 
p and q with p > q) 

II s, -s t II g ZS -,+1 II a* II ^ Zs- 4+1 II a II". 

Since || A || < 1, the last written quantity approaches zero as p t q — ► »; 
it follows that S p has a limit S as p -> ». To evaluate the limit we observe 
that 1 — A is invertible. (Proof: (1 — A)x = 0 implies that Ax = x % 
and, if x ^ 0, this implies that || Ax || = || x || > || A || • || x ||, a contra- 
diction.) Hence we may write (1) in the form 

(2) S p - (1 - A* +1 )( 1 - A)- 1 = (1 - A)“ x ( 1 - A* +1 ); 

since A p+1 —> 0 as p — » «>, it follows that S = (1 — A)” 1 . 

As another example of an infinite series of transformations we consider 
the exponential series. For an arbitrary linear transformation A (not 
necessarily with || A || < 1) we write 

-Sp - TZ-o-A”. 

nl 

Since we have 

Il-Sp-S, II ^ ZU-m^MII". 

and since the right side of this inequality, being a part of the power series 
for exp || A || — e’ I4|t , converges to 0 as p, q — > <», we see that there is a 
linear transformation S such that S p — > S. We write S — exp A; we shall 
merely mention some of the elementary properties of this function of A . 

Consideration of the triangular forms of A and of S p shows that the 
proper values of exp A, together with their algebraic multiplicities, are 
equal to the exponentials of the proper values of A. (This argument, as 
well as some of the ones that follow, applies directly to the complex case 
only; the real case has to be deduced via complexification.) From the 
consideration of the triangular form it follows also that the determinant of 
exp A, that is, Jlili exp Xi, where Xi, • • *, Xjy are the (not necessarily 
distinct) proper values of A, is the same as exp (Xi H h X*) *= exp 
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(tr A). Since exp f 5^ 0, this shows, incidentally, that exp A is always 
invertible. 

Considered as a function of linear transformations the exponential 
retains many of the simple properties of the ordinary numerical exponential 
function. Let us, for example, take any two commutative linear transforma- 
tions A and B. Since exp (A + B) — exp A exp B is the limit (as p — ► 00) 
of the expression 


Zs-o-.w + sr - 2: 

n! ml 


p 

ifc-O 


k\ 


B k 


= ZS-o ^ Z”=o Q A*B-* - Si-0 Zf-o A n B\ 
we will have proved the multiplication rule for exponentials when we have 

proved that this expression converges to zero. [ Here ( ) stands for the 

n! \ V Vi/ 

combinatorial coefficient J An easy verification yields the fact 

j\(n - j)! / 

that for k + m ^ p the product A m B k occurs in both terms of the last 
written expression with coefficients that differ in sign only. The terms 
that do not cancel out are all in the subtrahend and are together equal to 


Z~Z 


mlkl 


the summation being extended over those values of m and k that are ^ p 

and for which m + k > p. Since m + k > p implies that at least one 

of the two integers m and k is greater than the integer part of - (in 

symbols [H), the norm of this remainder is dominated by 
2 

+ (et- ^ 11 b r) (Ei.gj ~ 11 ir ) 

- (exp || A ||)a„ + (exp || B ||)0 P , 
where ap — ► 0 and — ► 0 as p — ► 
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Similar methods serve to treat f(A), where/ is any function representable 
by a power series, 

/(f) = En"-0«nf n , 

and where || A || is (strictly) smaller than the radius of convergence of the 
series. We leave it to the reader to verify that the functional calculus 
we are here hinting at is consistent with the functional calculus for normal 
transformations. Thus, for example, exp A as defined above is the same 
linear transformation as is defined by our previous notion of exp A in case 
A is normal. 


EXERCISES 

1. Give an alternative proof of the ergodic theorem, based on the spectra 
theorem for unitary transformations. 

2. Prove that if || 1 — A || < 1, then A is invertible, by considering the formal 
power series expansion of (1 — (1 — A))“K 
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HILBERT SPACE 


Probably the most useful and certainly the best developed generalization 
of the theory of finite-dimensional inner product spaces is the theory of 
Hilbert space. Without going into details and entirely without proofs 
we shall now attempt to indicate how this generalization proceeds and 
what are the main difficulties that have to be overcome. 

The definition of Hilbert space is easy: it is an inner product space satisfy- 
ing one extra condition. That this condition (namely, completeness) is 
automatically satisfied in the finite-dimensional case is proved in ele- 
mentary analysis. In the infinite-dimensional case it may be possible that 
for a sequence (x n ) of vectors || x n — x m || — » 0 as n, m —> <*>, but still 
there is no vector x for which jj x n — x || — * 0; the only effective way of 
ruling out this possibility is explicitly to assume its opposite. In other 
words: a Hilbert space is a complete inner product space. (Sometimes the 
concept of Hilbert space is restricted by additional conditions, whose 
purpose is to limit the size of the space from both above and below. The 
most usual conditions require that the space be infinite-dimensional and 
separable. In recent years, ever since the realization that such additional 
restrictions do not pay for themselves in results, it has become customary 
to use “Hilbert space” for the concept we defined.) 

It is easy to see that the space (P of polynomials with the inner product 

defined by (x, y) = J x(t)y(t) dt is not complete. In connection with the 


completeness of certain particular Hilbert spaces there is quite an extensive 
mathematical lore. Thus, for instance, the main assertion of the celebrated 
Riesz-Fischer theorem is that a Hilbert space manufactured out of the 

set of all those functions x for which \ x(t) | 2 dt < qo (in the sense of 

Lebesgue integration) is a Hilbert space (with formally the same definition 
of inner product as for polynomials). Another popular Hilbert space, 
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reminiscent in its appearance of finite-dimensional coordinate space, is 
the space of all those sequences (£ n ) of numbers (real or complex, as the 
case may be) for which l£n | 2 converges. 

Using completeness in order to discuss intelligently the convergence of 
some infinite sums, one can proceed for quite some time in building the 
theoiy of Hilbert spaces without meeting any difficulties due to infinite- 
dimensionality. Thus, for instance, the notions of orthogonality and of 
complete orthonormal sets can be defined in the general case exactly as we 
defined them. Our proof of Bessel's inequality and of the equivalence of 
the various possible formulations of completeness for orthonormal sets 
have to undergo slight verbal changes only. (The convergence of the 
various infinite sums that enter is an automatic consequence of Bessel's 
inequality.) Our proof of Schwarz's inequality is valid, as it stands, in 
the most general case. Finally, the proof of the existence of complete 
orthonormal sets parallels closely the proof in the finite case. In the 
unconstructive proof Zorn's lemma (or transfinite induction) replaces 
ordinary induction, and even the constructive steps of the Gram-Schmidt 
process are easily carried out. 

In the discussion of manifolds, functionals, and transformations the 
situation becomes uncomfortable if we do not make a concession to the 
topology of Hilbert space. Good generalizations of all our statements for 
the finite-dimensional case can be proved if we consider closed linear 
manifolds, continuous linear functionals, and bounded linear transformations. 
(In a finite-dimensional space every linear manifold is closed, every linear 
functional is continuous, and every linear transformation is bounded.) If, 
however, we do agree to make these concessions, then once more we can 
coast on our finite-dimensional proofs without any change most of the 
time, and with only the insertion of an occasional e the rest of the time. 
Thus once more we obtain that V — SfTl © 2fTl x , that 9TC = 9Tl x x , and that 
every linear functional of x has the form ( x , y); our definitions of self- 
adjoint and of positive transformations still make sense, and all our theo- 
rems about perpendicular projections (as well as their proofs) carry over 
without change. 

The first hint of how things can go wrong comes from the study of orthog- 
onal and unitary transformations. We still call a transformation U 
orthogonal or unitary (according as the space is real or complex) if UU* 
= U*U = 1, and it is still true that such a transformation is isometric, 
that is, that || Ux || - || x || for all x, or, equivalently, (C/x, Uy) = (x, y) 
for all x and y. It is, however, easy to construct an isometric transforma- 
tion that is not unitary; because of its importance in the construction of 
counterexamples we shall describe one such transformation. We consider 
a Hilbert space in which there is a countable complete orthonormal set, 



APPENDIX 


191 


say {x 0 , x u x 2 , •••}. A unique bounded linear transformation U is 
defined by the conditions Ux n = x n+ i for n — 0, 1, 2, • • * . This U is 
isometric (U*U — 1), but, since UU*x 0 = 0, it is not true that UU* = 1. 

It is when we come to spectral theory that the whole flavor of the develop- 
ment changes radically. The definition of proper value as a number X 
for which Ax = Xx has a non-zero solution still makes sense, and our theo- 
rem about the reality of the proper values of a self-adjoint transformation 
is still true. The notion of proper value loses, however, much of its sig- 
nificance. Proper values are so very useful in the finite-dimensional case 
because they are a handy way of describing the fact that something goes 
wrong with the inverse of A — X, and the only thing that can go wrong is 
that the inverse refuses to exist. Essentially different things can happen 
in the infinite-dimensional case; just to illustrate the possibilities, we 
mention, for example, that the inverse of A — X may exist but be un- 
bounded. That there is no useful generalization of determinant, and 
hence of the characteristic equation, is the least of our worries. The 
whole theory has, in fact, attained its full beauty and maturity only after 
the slavish imitation of such finite-dimensional methods was given up. 

After some appreciation of the fact that the infinite-dimensional case 
has to overcome great difficulties, it comes as a pleasant surprise that the 
spectral theorem for self-adjoint transformations (and, in the complex 
case, even for normal ones) does have a very beautiful and powerful 
generalization. (Although we describe the theorem for bounded trans- 
formations only, there is a large class of unbounded ones for which it is 
valid.) In order to be able to understand the analogy, let us re-examine 
the finite-dimensional case. 

Let A be a self-adjoint linear transformation on a finite-dimensional 
inner product space, and let A = M*/ be its spectral form. If M is 

an interval in the real axis, we write E(M) for the sum of all those Fj for 
which \j belongs to M. It is clear that E(M ) is a perpendicular projection 
for each M. The following properties of the projection- valued interval- 
function E are the crucial ones: if M is the union of a countable collection 
{M n } of disjoint intervals, then 

(1) E(M) = £» E(M n ), 

and if M is the improper interval consisting of all real numbers, then 
E(M) = 1. The relation between A and E is described by the equation 

A = A;}), 

where, of course, {Xy} is the degenerate interval consisting of the single 
number Xy. Those familiar with Lebesgue-Stieltjes integration will recog- 
nize the last written sum as a typical approximating sum to an integral of 
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the form J X dE(\) and will therefore see how one may expect the general- 
ization to go. The algebraic concept of summation is to be replaced by 
the analytic concept of integration; the generalized relation between A and 
E is described by the equation 

(2) A = f\dE(\). 

Except for this formal alteration, the spectral theorem for self-adjoint 
transformations is true in Hilbert space. We have, of course, to interpret 
correctly the meaning of the limiting operations involved in (1) and (2). 
Once more we are faced with the three possibilities mentioned in § 91. 
They are called uniform, strong, and weak convergence respectively, and 
it turns out that both (1) and (2) may be given the strong interpretation. 
(The reader deduces, of course, from our language that in an infinite-di- 
mensional Hilbert space the three possibilities are indeed distinct.) 

We have seen that the projections Fj entering into the spectral form of A 
in the finite-dimensional case are very simple functions of A (§ 82). Since 
the E{M) are obtained from the Fj by summation, they also are functions 
of A , and it is quite easy to describe what functions. We write guit) = 

1 if £ is in M and 0 m({*) = 0 otherwise; then E(M) — gjvf(A). This fact 
gives the main clue to a possible proof of the general spectral theorem. 
The usual process is to discuss the functional calculus for polynomials, 
and, by limiting processes, to extend it to a class of functions that includes 
all the functions qm- Once this is done, we may define the interval- 
function E by writing E(M) = g M {A)] there is no particular difficulty in 
establishing that E and A satisfy (1) and (2). 

After the spectral theorem is proved, it is easy to deduce from it the 
ge neralized versions of our theorems concerning square roots, the functional 
ca lculus, the polar decomposition, and properties of commutativity, and, 
in fact, to answer practically every askable question about bounded normal 
tr ansformations. 

The chief difficulties that remain are the considerations of non-normal 
an d of unbounded transformations. Concerning general non-normal trans- 
formations, it is quite easy to describe the state of our knowledge; it is 
non-existent. No even unsatisfactory generalization exists for the tri- 
ngular form or for the Jordan canonical form and the theory of elementary 
ivisors. Very different is the situation concerning normal (and par- 
icularly self-adjoint) unbounded transformations. (The reader will 
ympathize with the desire to treat such transformations if he recalls 
hat the first and most important functional operation that most of us 
earn is differentiation.) In this connection we shall barely hint at the 



APPENDIX 


193 


main obstacle the theory faces. It is not very difficult to show that 
if a self-adjoint linear transformation is defined for all vectors of Hilbert 
space, then it is bounded. In other words, the first requirement con- 
cerning transformations that we are forced to give up is that they be de- 
fined everywhere. The discussion of the precise domain on which a self- 
adjoint transformation may be defined and of the extent to which this 
domain may be enlarged is the chief new difficulty encountered in the 
study of unbounded transformations. 
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